Design Concepts and Usage of Modern CMake Tools

Link: https://ukabuer.me/blog/more-modern-cmake/

For C/C++ developers, managing projects often becomes quite tricky when it comes to complex third-party dependencies, especially when cross-platform development is required.

CMake, as a cross-platform build process management tool, provides mature solutions for finding and introducing third-party dependencies, creating build systems, testing programs, and installation. By writing a CMakeLists.txt file once and executing the same commands, executable programs or link libraries can be created on different systems. After becoming familiar with CMake, I believe this compilation experience can barely match half of what modern languages like Rust and Go offer, with the other half lacking in package management, which I won’t discuss here. Of course, if you’re just solving algorithm problems, there’s no need for such a complex tool as CMake; simple usage of gcc or clang will suffice.

Like C++, CMake has undergone many improvements over the years, resulting in significant differences compared to older versions, leading to the concept of modern CMake. Traditional CMake usage is not without merit, but just like modern C++, modern CMake usage is clearer in some concepts, more user-friendly, and less error-prone.

# A simple example of a modern CMake project
cmake_minimum_required(VERSION 3.12)
project(myproj)
find_package(Poco REQUIRED COMPONENTS Net Util)
add_executable(MyEXE)
target_source(MyEXE PRIVATE "main.cpp")
target_link_library(MyEXE PRIVATE Poco::Net Poco::Util)
target_compile_definition(MyEXE PRIVATE std_cxx_14)

Target and Target Configuration

A C/C++ project is typically aimed at producing executable programs or link libraries, which are collectively referred to as<span>target</span> in modern CMake, with the creation commands being<span>add_library()</span> and <span>add_executable()</span>. The types of link libraries further divide into many kinds, with the most commonly used being<span>SHARED</span> and <span>STATIC</span>, declared in commands with keywords:<span>add_library(MyLib SHARED)</span>. The first parameter is the<span>target</span> name, which will be used in subsequent configurations.

In a<span>CMakeLists.txt</span>, there can be multiple<span>target</span>s, and most configurations revolve around these targets. For example, specifying the source files of a<span>target</span>:

target_source(MyLib PRIVATE "main.cpp" "func.cpp")

In CMake, the<span>PRIVATE</span> keyword is used to describe the “scope of application” of parameters. Additionally, there are two other possible values: <span>INTERFACE</span> and <span>PUBLIC</span>, which will be detailed in the next section and can be ignored for now.

When transforming an existing project into a CMake project, there are usually many source files, and CMake’s<span>file</span> command can be used to traverse and retrieve all source files:

file(GLOB_RECURSE SRCS ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)

The first parameter of the command<span>GLOB_RECURSE</span> indicates recursive search for subfolders, the second parameter<span>SRCS</span> is the variable name to store the results, and the third parameter is the matching pattern for target files. Once cpp files that meet the criteria are found, their paths will be stored in the SRCS variable as a string array, which can be used as follows:

target_source(MyLib PRIVATE ${SRCS})

In addition to source files, when configuring a<span>target</span>, it is also usually necessary to specify the header file directories:

target_include_directories(MyLib PRIVATE ${CMAKE_CURRENT_SOURCE_DIR}/include/)

Language features required at compile time:

target_compile_features(MyLib PRIVATE std_cxx_14)

And macros defined at compile time:

target_compile_definitions(MyLib PRIVATE LogLevel=3)

If you have some parameters you want to pass directly to the underlying compiler (like gcc, clang, cl), you can use:

target_compile_options(MyLib PRIVATE -Werror -Wall -Wextra)

The configurations done via commands like<span>target_source</span> and other <span>target_*</span> forms are effective only for the specified target. In traditional CMake, these configurations are usually defined as global variables, such as using <span>include_directories()</span>, <span>set_cxx_flags()</span>, etc. The problem with the traditional approach is its low flexibility; when multiple targets exist, it’s impossible to configure them separately, leading to some target’s properties being accidentally polluted. Therefore, the modern CMake configuration method based on targets is as reassuring as introducing namespaces.

Build Specification and Usage Requirement

Dependencies are very common in software development. C/C++ introduces dependencies through include header files, which can be called after dynamic or static linking. An executable program may depend on link libraries, which may also depend on other link libraries. A tricky question arises: how does the user know what conditions are required to use these external dependency libraries? For example, the code in their header files may require enabling C++17 support in the compiler, or when many dynamic link libraries exist, only a small part of them may need to be linked. What indirect dependencies need to be installed? What are the version requirements for the indirect dependencies…?
The simplest and most straightforward solution to these problems is textual explanation. The authors of the dependency library can specify usage requirements in some README, website, or even in the header files, but this method is clearly inefficient.
CMake provides a solution: when configuring a target, you can specify the type of configuration, which is divided into two categories: build specification and usage requirement, which will affect the application scope of the configuration. Build specification type configurations only need to be satisfied during compilation, declared with the PRIVATE keyword; usage requirement type configurations need to be satisfied during usage, i.e., when other projects use the already compiled target of this project, this type of configuration is declared with the INTERFACE keyword. In actual projects, many configurations need to be satisfied both during compilation and usage; these configurations are declared with the PUBLIC keyword.
Let’s look at an example: we wrote a library that statically links Boost during compilation, uses C++14 features in our implementation files, and uses Boost’s header files and functions. We then release this library, which includes header files and pre-compiled dynamic link libraries. Even though our implementation code uses C++14, the header files we provide externally only use C++03 syntax and do not include any Boost code. In this case, when other projects use our library, their compilers do not need to enable C++14 support, and there is no need to install Boost in the development environment. Our library’s CMake configuration can be written like this:
target_compile_features(MyLib PRIVATE cxx_std_14)
target_link_libraries(MyLib PRIVATE Boost::Format)

Here, using PRIVATE indicates that C++14 support is only needed during compilation, and linking the Boost library is also only needed during compilation. But if the header files we provide externally also use C++14, we need to use PUBLIC to modify it, changing it to:

target_compile_features(MyLib PUBLIC cxx_std_14)
target_link_libraries(MyLib PRIVATE Boost::Format)

When the library is header-only, our project does not need to be compiled separately, so there is no build specification, and we can simply use INTERFACE to modify the configuration:

target_compile_features(MyLib INTERFACE cxx_std_14)

It is important to note that usage requirement type configurations, declared with INTERFACE or PUBLIC, are transitive; for example, if LibA depends on LibB, it will inherit LibB’s usage requirements. Subsequently, if LibC depends on LibB, both LibA and LibB’s usage requirements will be inherited, which is very useful in the case of multi-level dependencies.

Now, a question arises: how can users know about these targets and their<span>PRIVATE</span>, <span>INTERFACE</span>, and <span>PUBLIC</span> attributes?

Finding and Using Link Libraries

For users, a major question is how to find dependencies and understand how to use them. The C/C++ standard does not specify the installation location and format of libraries. The solution provided by CMake for finding dependencies not only locates header file directories and link library paths but also retrieves the library’s usage requirements.
The command for finding third-party libraries in CMake is find_package, which works in two ways: one is based on Config File searching, and the other is based on Find File searching. When executing find_package, CMake is actually looking for these two types of files, and after finding them, it retrieves information about the library.
1. Finding Dependencies Through Config Files
Config Files are CMake scripts provided by the dependency developers, usually released together with pre-compiled binaries, for downstream users. In the Config file, the targets included in the library are described, detailing version information, header file paths, link library paths, compilation options, and other usage requirements.
CMake has rules for naming Config files. For a command like find_package(ABC), CMake will only look for ABCConfig.cmake or abc-config.cmake. The default search paths for CMake depend on the platform; under Linux, the search paths include /usr/lib/cmake and /usr/lib/local/cmake, where many Config Files can be found. Generally, when a library is installed, its accompanying Config file will be placed here.
On Windows, there are no specifications for library installation, and thus there are no such directories; libraries may be installed in various odd places. Additionally, under Linux, libraries may not be installed in the aforementioned default locations. In these cases, CMake provides a solution: for the find_package(Abc) command, if CMake does not find the Config file, users can provide the Abc_DIR variable, and CMake will search for the Config file in the path pointed to by Abc_DIR.
2. Finding Dependencies Through Find Files
Config files seem perfect, as they are CMake scripts written by developers, allowing users to obtain library usage requirements simply by locating the Config file. However, the reality is that not all developers use CMake; many libraries do not provide Config files for CMake usage, but we can still use Find Files.
For the find_package(ABC) command, if CMake does not find the Config file, it will try to find FindABC.cmake. Find Files have similar functionality to Config Files, but the difference is that Find Files are written by others, not the library developers. If the library you are using does not provide a Config file, you can search online for a Find file or write one yourself, and then include it in your CMake project.
Good news is that CMake has provided many Find files for us. On the CMake Documentation page, you can see that well-known libraries like OpenGL, OpenMP, and SDL have official Find scripts written for them, allowing direct calls to the find_package command. However, since the installation locations of libraries are not fixed, these Find scripts may not always find the library. In such cases, you can set the corresponding variable based on CMake’s error messages, usually needing to provide the installation path, thus enabling Find Files to retrieve the library’s usage requirements. Whether using Config files or Find files, the goal is not just to find the library but also to inform CMake how to use it.
The bad news is that a larger number of libraries do not have Find files provided by CMake; in this case, you will need to write one yourself or rely on search. Once written, place it in your project’s directory and modify the CMAKE_MODULE_PATH variable:
list(INSERT CMAKE_MODULE_PATH 0 ${CMAKE_SOURCE_DIR}/cmake)
Now the Find files in the ${CMAKE_SOURCE_DIR}/cmake directory can be found by CMake.
However, a new question arises: how should Config files and Find files be written?
Imported Target
In C/C++ projects, our fundamental requirement for dependencies is to know their link library paths and header file directories. This can be accomplished using CMake’s find_library and find_path commands:
find_library(MPI_LIBRARY
  NAMES mpi
  HINTS "${CMAKE_PREFIX_PATH}/lib" ${MPI_LIB_PATH}
  # If libmpi.so is not found in the default paths, it will also look in MPI_LIB_PATH, which can be set by downstream users
)
find_path(MPI_INCLUDE_DIR
  NAMES mpi.h
  PATHS "${CMAKE_PREFIX_PATH}/include" ${MPI_INCLUDE_PATH}
  # If mpi.h is not found in the default paths, it will also look in MPI_INCLUDE_PATH, which can be set by downstream users
)
In the early days of CMake, dependency developers declared these two things in CMake scripts using global variables. For example, for a library named Abc, its developer would create Abc_INCLUDE_DIRS and Abc_LIBRARIES variables in their CMake script for downstream users. Although this command is not officially mandated, everyone adhered to this convention, and many libraries still provide such global variables to maintain compatibility with the old CMake usage.
In modern CMake, providing a target in the CMake script is clearly better, as targets have properties. We not only need to find the library but also understand how to use it. Using targets, besides the header file directory and link library paths, we can obtain more information about the library.
Thus, modern CMake provides a special target, Imported Target, created with the command add_library(Abc STATIC IMPORTED), which indicates that the dependency already exists externally and does not need to be compiled. The second parameter specifies the type, such as static or dynamic library, etc. Developers seem to prefer using a namespace approach for the names of Imported Targets, such as Boost::Format, Boost::Asio, etc. Similarly, a CMake script can have multiple Imported Targets.
We can call commands like target_link_libraries on Imported Targets as if they were normal targets to specify their usage requirements. However, there is another configuration method. As mentioned earlier, the PRIVATE, INTERFACE, and PUBLIC keywords can modify target properties, which can be seen as syntactic sugar. In CMake, most properties of targets have corresponding private and interface versions of variables. For example, when configuring header file directories using the target_include_directories command, if the PRIVATE keyword is used, the value is written to the target’s INCLUDE_DIRECTORIES variable; if INTERFACE is used, the value is written to the INTERFACE_INCLUDE_DIRECTORIES variable; and if PUBLIC is used, it will write to both variables. In CMake, we can modify these variable values directly using set_target_properties instead of using target commands.
For Imported Targets, when the library has already been compiled, we need to specify the exact location of the dynamic link library using a special variable, IMPORTED_LOCATION. This variable can be set using set_target_properties. In actual production environments, due to the differences between Release and Debug environments, IMPORTED_LOCATION actually has multiple versions, such as IMPORTED_LOCATION_RELEASE and IMPORTED_LOCATION_DEBUG. Once set, CMake will choose the correct link library for downstream users based on these variables in the corresponding environment.
# Imported Target for the spdlog library
set_target_properties(spdlog::spdlog PROPERTIES
  IMPORTED_LINK_INTERFACE_LANGUAGES_RELEASE "CXX"
  IMPORTED_LOCATION_RELEASE "${_IMPORT_PREFIX}/lib/spdlog/spdlog.lib"
)
Another advantage of using Imported Targets is that when introducing a dependency, we only need to link its Imported Target, without needing to manually add its header file directories. This is because the header file directories of the dependency are already included in its target’s INTERFACE properties, which are transitive. Thus:
find_package(spdlog REQUIRED)
add_executable(MyEXE)
target_source(MyExe "main.cpp")
target_link_libraries(MyExe SPDLog::spdlog)
No need for target_include_directories; the header file directories of spdlog will be automatically included.
3. Handling find_package
Returning to the find_package command, this command can specify many parameters, such as version and specific modules, etc. Taking the SFML multimedia library as an example, which includes network, audio, graphic modules, etc., but I often only use the graphic module, so the corresponding link libraries for the other modules do not need to be linked. Thus, the CMake script can be written as follows:
# Require the graphic module of SFML library with major version 2
find_package(SFML 2 COMPONENTS graphics REQUIRED)
# The target name provided by SFML is sfml-graphics
target_link_libraries(MyEXE PRIVATE sfml-graphics)

For the<span>find_package</span> command, these parameters for version, module, etc., clearly need to be handled in the Config file or Find file. In cases of version mismatch or non-existent modules, downstream users should be prompted. On this front, CMake also considers dependency developers, providing the <span>FindPackageHandleStandardArgs</span> module. By including this module in the CMake script, you can use the <span>find_package_handle_standard_args</span> command to inform CMake how to obtain the current package’s version variable and how to know if the library was found, as in the following CMake script for RapidJSON:

include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(RapidJSON
    REQUIRED_VARS RapidJSON_INCLUDE_DIR
    VERSION_VAR RapidJSON_VERSION
)

This script declares that the version value of the current library should be obtained from the <span>RapidJSON_VERSION</span> variable, and the <span>RapidJSON_INCLUDE_DIR</span> variable can be used to indicate whether the library was found. When executing this script, CMake first checks whether the <span>RapidJSON_INCLUDE_DIR</span> variable is empty. If it is empty, it indicates that the library was not found, and CMake will directly report an error to downstream users. If this variable is not empty and downstream users provide a version number when calling <span>find_package</span>, CMake will compare the value obtained from the <span>RapidJSON_VERSION</span> variable.

Using CMake to Compile

After CMake generates the build environment, the underlying make, ninja, MSBuild commands are different, but CMake provides a unified method for compilation:

cmake --build .

Using the –build flag, CMake will call the underlying compilation commands, which is very convenient for cross-platform usage.

For Visual Studio, its Debug and Release environments are based on configuration, so the CMAKE_BUILD_TYPE variable is ineffective and needs to be specified during the build:

cmake --build . --config Release

CMake’s Deficiencies

CMake’s shortcomings are quite apparent; the entry cost is high, and its syntax design is quite poor. Functions like find_package do not return results but instead produce side effects on global variables or targets, making their behavior difficult to predict without consulting documentation. Furthermore, in CMake, the distinction between variables, targets, and strings is unclear, which can easily confuse users about when to use ${} to read values.
Additionally, the tutorials on the official website are very outdated. Although usable, they do not demonstrate how to create projects using modern CMake methods. It is recommended to refer to the materials provided at the end of this article rather than the tutorials on the official website.
I hope that before I update again, better alternative tools will emerge, as I will cover the specific creation methods for Config files, library installation, and tests based on ctest when I have time.

References:

cmake-buildsystem
cmake-packages
It’s Time To Do CMake Right
— EOF —

Design Concepts and Usage of Modern CMake Tools

Leave a Comment