Understanding the Power of CMake Build Tool

CMake is an open-source, cross-platform build tool that is very well-known in the development community and is used in various projects. This article aims to explore why CMake is powerful.

Content Source: https://aosabook.org/en/cmake.html Original Title: CMake Authors: Bill Hoffman && Kenneth Martin Translator: Liu Zuosha Note: The translation may have some simplifications, rearrangements, and additions compared to the original text, mainly to extract important content for better understanding. Please refer to the original text for accuracy.

CMake is not a build system like Unix Make but a build system generator. Its purpose is to take your description of a project and generate a set of configuration files to build that project.

CMake is a generator of build systems. Its goal is to generate a series of configuration files based on your project description to compile and build the project.

As part of the generation of build configuration files, CMake also analyzes source code to create a dependency graph of components so that when building the project unnecessary recompilation steps can be omitted to reduce build times. For larger projects, this can reduce build times down from tens of minutes or hours to a few minutes, perhaps even less than one minute.

In addition to a build system, over the years CMake has evolved into a family of development tools: CMake, CTest, CPack, and CDash. CMake is the build tool responsible for building software. CTest is a test driver tool used to run regression tests. CPack is a packaging tool used to create platform-specific installers for software built with CMake. CDash is a web application for displaying testing results and performing continuous integration testing.

Notion

The build tree is the directory hierarchy in which all generated files are placed. Generated files consist of the makefile, the compiled object files, and a dependency file (with a .d extension) for each source file.

Requirements

When CMake was being developed, the normal practice for a project was to have a configure script and Makefiles for Unix platforms and Visual Studio project files for Windows. This duality of build systems made cross-platform development very tedious for many projects: the simple act of adding a new source file to a project was painful. The obvious goal for developers was to have a single unified build system. The developers of CMake had experience with two approaches to solving the unified build system problem.

The basic constraints of the new build system would be as follows:

  • Depend only on a C++ compiler being installed on the system.
  • It must be able to generate Visual Studio IDE input files.
  • It must be easy to create the basic build system targets, including static libraries, shared libraries, executables, and plugins.
  • It must be able to run build time code generators.
  • It must support separate build trees from the source tree.
  • It must be able to perform system introspection, i.e., be able to determine automatically what the target system could and could not do.
  • It must do dependency scanning of C/C++ header files automatically.
  • All features would need to work consistently and equally well on all supported platforms.

In order to avoid depending on any additional libraries and parsers, CMake was designed with only one major dependency, the C++ compiler (which we can safely assume we have if we’re building C++ code). This did limit CMake to creating its own simple language, which is a choice that still causes some people to dislike CMake. However, at the time the most popular embedded language was Tcl. If CMake had been a Tcl-based build system, it is unlikely that it would have gained the popularity that it enjoys today.

The ability to generate IDE project files is a strong selling point for CMake, but it also limits CMake to providing only the features that the IDE can support natively. However, the benefits of providing native IDE build files outweigh the limitations. Although this decision made the development of CMake more difficult, it made the development of ITK and other projects using CMake much easier. Developers are happier and more productive when using the tools they are most familiar with. By allowing developers to use their preferred tools, projects can take best advantage of their most important resource: the developer.

Another early CMake requirement also came from autotools: the ability to create build trees that are separate from the source tree. This allows for multiple build types to be performed on the same source tree. It also prevents the source tree from being cluttered with build files, which often confuses version control systems.

Implementation

Environment Variables (or Not)

The trouble with this approach is that for the build to work, all of these external variables need to be set each time a build is performed. To solve this problem, CMake has a cache file that stores all of the variables required for a build in one place. These are not shell or environment variables, but CMake variables. The first time CMake is run for a particular build tree, it creates a CMakeCache.txt file which stores all the persistent variables for that build. Since the file is part of the build tree, the variables will always be available to CMake during each run.

The Configure Step

During the configure step, CMake first reads the CMakeCache.txt if it exists from a prior run. It then reads CMakeLists.txt, found in the root of the source tree given to CMake. During the configure step, the CMakeLists.txt files are parsed by the CMake language parser. Each of the CMake commands found in the file is executed by a command pattern object. Additional CMakeLists.txt files can be parsed during this step by the include and add_subdirectory CMake commands. CMake has a C++ object for each of the commands that can be used in the CMake language. Some examples of commands are add_library, if, add_executable, add_subdirectory, and include. In effect, the entire language of CMake is implemented as calls to commands. The parser simply converts the CMake input files into command calls and lists of strings that are arguments to commands.

The Generate Step

Once the configure step has been completed, the generate step can take place. The generate step is when CMake creates the build files for the target build tool selected by the user. At this point, the internal representation of targets (libraries, executables, custom targets) is converted to either an input to an IDE build tool like Visual Studio, or a set of Makefiles to be executed by make. CMake’s internal representation after the configure step is as generic as possible so that as much code and data structures as possible can be shared between different build tools.

The Code

CMake is an object-oriented system using inheritance, design patterns, and encapsulation.

The results of parsing each CMakeLists.txt file are stored in the cmMakefile object. In addition to storing the information about a directory, the cmMakefile object controls the parsing of the CMakeLists.txt file. The parsing function calls an object that uses a lex/yacc-based parser for the CMake language. Since the CMake language syntax changes very infrequently, and lex and yacc are not always available on systems where CMake is being built, the lex and yacc output files are processed and stored in the Source directory under version control with all of the other handwritten files.

Another important class in CMake is cmCommand. This is the base class for the implementation of all commands in the CMake language. Each subclass not only provides the implementation for the command, but also its documentation.

Dependency Analysis

Since Integrated Development Environments (IDEs) support and maintain file dependency information, CMake skips this step for those build systems. For IDE builds, CMake creates a native IDE input file and lets the IDE handle the file-level dependency information. The target-level dependency information is translated to the IDE’s format for specifying dependency information.

With Makefile-based builds, native make programs do not know how to automatically compute and keep dependency information up-to-date. For these builds, CMake automatically computes dependency information for C, C++, and Fortran files. Both the generation and maintenance of these dependencies are automatically done by CMake. Once a project is initially configured by CMake, users only need to run make and CMake does the rest of the work.

CMake does more than just generate the build files used to create object files and executable programs. It will generate a dependency file for each source file in the project. For example, a main.cpp file will have a generated main.cpp.d file saved in the build folder hierarchy honoring the directory structure of the source files.

Although users do not need to know how CMake does this work, it may be useful to look at the dependency information files for a project. This information for each target is stored in four files called depend.make, flags.make, build.make, and DependInfo.cmake. depend.make stores the dependency information for all the object files in the directory. flags.make contains the compile flags used for the source files of this target. If they change, then the files will be recompiled. DependInfo.cmake is used to keep the dependency information up-to-date and contains information about what files are part of the project and what languages they are in. Finally, the rules for building the dependencies are stored in build.make. If a dependency for a target is out of date, then the dependency information for that target will be recomputed, keeping the dependency information current. This is done because a change to a .h file could add a new dependency.

CTest and CPack

The ctest executable is used to run regression tests. A project can easily create tests for CTest to run with the add_test command. The tests can be run with CTest, which can also be used to send testing results to the CDash application for viewing on the web. CTest and CDash together are similar to the Hudson testing tool. They do differ in one major area: CTest is designed to allow a much more distributed testing environment. Clients can be set up to pull source from the version control system, run tests, and send the results to CDash. With Hudson, client machines must give Hudson ssh access to the machine so tests can be run.

The cpack executable is used to create installers for projects. CPack works much like the build part of CMake: it interfaces with other packaging tools. For example, on Windows, the NSIS packaging tool is used to create executable installers from a project. CPack runs the install rules of a project to create the install tree, which is then given to an installer program like NSIS. CPack also supports creating RPM, Debian .deb files, .tar, .tar.gz, and self-extracting tar files.

Reference

https://aosabook.org/en/cmake.html

https://blog.feabhas.com/2021/07/cmake-part-1-the-dark-arts/

https://blog.feabhas.com/2021/07/cmake-part-2-release-and-debug-builds/

Leave a Comment