Understanding C++ Preprocessor Directives

IntroductionWe have finally arrived at the section related to C++ syntax. We will first introduce one of the most unique, magical, powerful yet also the most problematic, bizarre, and ugly parts of C++ syntax—preprocessor directives, especially macros, and we will use the important application of header files to glimpse some usage methods of preprocessor directives.Main Content1. What are Preprocessor DirectivesPreprocessor directives in C++ are a series of instructions parsed by the preprocessor, which basically start with # and directly manipulate the source code based on text. Remember this characteristic of directly manipulating source code. This functionality is very powerful because it can greatly extend C++ syntax and achieve some highly valuable features. However, directly manipulating source code also brings many problems and dangers. Especially since this functionality is based on text manipulation, it loses semantic information. Therefore, this functionality has been abandoned or improved in many languages.After all this, you may still be unclear about what preprocessor directives specifically are. Below we will show some examples of preprocessor directives to help you better understand them:1) #includeThis should be the most familiar preprocessor directive to everyone. You must have used #include <iostream> or #include <stdio.h> when writing a hello world program. The syntax of the #include preprocessor directive is #include <filepath> or #include “filepath”, which serves to paste the content of the file at the path filepath directly as text at the location of the #include directive. The difference is that <> searches from the system directory, while “” starts searching from the location of the source file containing the include directive, and if not found, it then searches the system directory. Therefore, a common convention is to use <> to point to system-provided files (like stdio.h) and “” to point to files you have written yourself.2) #defineThe #define directive is used to define a macro. A macro is a symbol that can represent some content or represent nothing at all. For macros with values, the corresponding symbol will be replaced with the text that follows during preprocessing. For macros without values, preprocessing will replace them with nothing. This replacement process is called macro expansion. Examples of both usages are as follows:

#define SYMBOL_A 100 // Define a macro with a value
#define SYMBOL_B // Define a macro without a value
int a = SYMBOL_A; // int a = 100;
int b = SYMBOL_B; // Syntax error

Thus, the former is often used to define some preset constants, such as INT_MAX representing the maximum value that an int can represent, while the latter is often used as a flag to control other parts of the code, such as the __cplusplus macro to indicate that this is a C++ source file.Some advanced uses of macros will be discussed later.3) #ifdef, #ifndef, #endifThe first two are used to check whether a symbol has already been defined. #ifdef retains the content between it and its paired #endif when the symbol is defined, while #ifndef does the opposite. Example usage is as follows:

#define SYMBOL_A
#ifdef SYMBOL_A
int a = 1; // Retained
#endif
#ifndef SYMBOL_A
int b = 2; // Deleted
#endif

There are also some other preprocessor directives, but we will not discuss them for now, as they are not needed in this article.2. Macro FunctionsThe reason for discussing macros separately is that this may be the most powerful aspect of preprocessor directives. In addition to defining constants, macros can also be used to define macro functions. For example:

#define ADD(x, y) x + y
int a = 1;
int b = 2;
int c = ADD(a, b); // int c = a + b;

These macro functions have many advantages:1) Support for multiple typesSince macro functions are purely based on text replacement, there are no restrictions on the types of variables passed in. For the ADD macro function written above, it can be used with float types just as well.2) InlineNormal functions require the creation and destruction of a memory area called a stack frame when called, which adds overhead to their calls. Inline functions, unlike normal functions, do not have this process; instead, they effectively paste the corresponding code directly at the location. If the computational overhead within the function is small, the performance loss from handling the stack frame can be significant. In this case, using inline can significantly optimize performance. Of course, you may ask: since inline is so good, why aren’t all functions inline? Because the machine code for inline functions exists in each place that calls it, while normal functions only have a single global instance, so inline increases the space occupied by the program.However, these features in C++ actually have some alternatives, such as using templates for supporting multiple types, and inline can be automatically optimized by the compiler (not just the inline keyword, which is more complex). Macro functions are more for compatibility with C, as C lacks the aforementioned alternative features.Additionally, macro functions also have some disadvantages:1) No type checkingMacro functions do not perform any type checking on the parameters passed in, so the code obtained after expansion may not compile or lead to unexpected consequences.2) AmbiguitySuppose we have the following code:

#define MUL(x, y) x * y
int a = MUL(1 + 2, 3); // int a = 1 + 2 * 3;

In this case, the result of a will be 7 (due to the direct text replacement of the macro), which clearly does not align with our intention.Of course, we can make the following modification:

#define MUL(x, y) (x) * (y)
int a = MUL(1 + 2, 3); // int a = (1 + 2) * (3)

Now our code is fine. However, this method indeed brings some pitfalls to macro functions, and often the problems and solutions are not so obvious.3) Non-hygienicAs someone who really likes Rust, I must mention this point.Suppose we have the following code:

#define INCREMENT(i) { int a = 0; ++i; }
int a = 1;
int b = 1;
INCREMENT(a); // { int a = 0; ++a; }
INCREMENT(b); // { int a = 0; ++b; }
cout &lt;&lt; a; // a=1
cout &lt;&lt; b; // b=2

At this point, since i is textually replaced with a, and the global a is shadowed by the local variable a, the global a is not modified.This problem is almost irreparable; the only solution is to replace the variable names used internally in the macro with complex forms that are almost impossible to pass in during calls. Rust’s macros do not have this problem because they use some semantic information.It can be seen that while macro functions are powerful, they come with significant risks. These issues are fundamentally caused by the text-based processing of preprocessor directives.3. Function Declarations and ImplementationsTo discuss header files, we must first talk about function declarations and implementations. The syntax for function declarations and implementations is as follows:

int add(int a, int b); // Declaration
int add(int a, int b) { return a + b; } // Implementation

The purpose of a function declaration is to inform the compiler that there exists a function with a certain name, what kind of parameters it accepts, and what the return value is. The purpose of a function implementation is to tell the compiler what operations this function specifically needs to perform.There are the following rules regarding function declarations and implementations:1) A declared function must have exactly one implementation(which can be in any source file, but across all source files, there can only be one)2) A function can be implemented directly without a declaration3) A function must(in terms of the line of code within the same file)be declared or implemented before it can be called4) A function can be declared multiple times (in the same or different files)So what is the use of function declarations? They are mainly used in situations where the implementation and calling are not in the same file. For example:

//main.cpp
int add(int a, int b);
int main() { add(1, 2); return 0; }

//add.cpp
int add(int a, int b) { return a + b; }

In actual projects, organizing functions into different files based on functionality is an important practice to increase project maintainability. For C++ compilers, each source file is compiled as a separate unit, and during compilation, it cannot see the code in other units. Only during the linking phase are the compiled products of these codes combined. Therefore, in a file where a function is called but not implemented, there must be a function declaration; otherwise, the compiler cannot perform type checking and other tasks.This peculiar compilation mechanism is actually limited by the era. Due to limited computer memory at the time, each source code file could only be compiled separately. Additionally, for speed reasons, to avoid multiple traversals of the source file, programmers had to manually write function declarations and must do so before calling. Many modern language compilers no longer work this way, so there is no need to manually write declarations, and one can call before defining.4. The Purpose of Header FilesHaving discussed so much, what is the purpose of header files? Simply put, header files store function declarations. Writing C++ is already quite cumbersome, and clearly, no one wants to make it even more cumbersome. If we need to write the declarations of the functions used in each file separately, it would undoubtedly be inefficient and hard to maintain. Thus, header files were born: they store some function declarations, and when a file needs to use related functions, it only needs to use the #include directive to paste the corresponding header file. Therefore, for each function, we now only need to maintain two pieces of code: the function implementation and the corresponding declaration in a header file. An example is as follows:

//main.cpp
#include "add.h"
int main() { add(1, 2); return 0; }

//add.cpp
int add(int a, int b) { return a + b; }

//add.h
int add(int a, int b);

Of course, you may have a question: why not put the function implementation directly in the header file? Looking back atthe rule 1 regarding function declarations and implementations, a function can only be implemented once. If the implementation is stored in the header file, when it is included in more than one source file, it would be equivalent to having multiple implementations of the same function (the reason for more than one is that header files themselves are generally not compiled directly as source files but are pasted into source files as part of them).Conventionally, header files usually have .h or .hpp as their suffix to distinguish them from source files and are stored in a separate directory named include.5. Problems with Header FilesIt seems that everything is fine with header files now, but consider the following situation:

//a.h
#include "b.h"

//b.h
#include "a.h"

What will happen at this point? If the preprocessor tries to fully expand these contents, it will ultimately fall into infinite recursion. Of course, in reality, the preprocessor will report an error after a certain number of recursions. You might think this situation is rare and relatively easy to detect. However, what if these two header files are indirectly included by other header files? Additionally, while function declarations can be repeated, some things cannot, so besides circular dependencies, repeated #includes can also cause problems. These situations are often unavoidable, so how do we solve this problem?The answer is to use header guards (include guards). Consider the following code:

//a.h
#ifndef CPP_TEST_A_H
#define CPP_TEST_A_H
some code();
#endif

If we add such code to a header file, when it is pasted into a source file for the first time, the macro has not yet been defined, so it will define the macro and paste the subsequent code. After that, the macro is already defined, so no content will be pasted. This solves the problems of circular and repeated inclusions.However, this solution clearly has flaws. First, the defined macro cannot have the same name as others, and second, it is somewhat cumbersome to handle each header file this way. Of course, some compilers provide the following non-standard method to solve this problem:

#pragma once
some code();

However, it is ultimately non-standard and not supported by all compilers. Therefore, this header file mechanism actually has some risks.Additionally, every time a header file is pasted, it is recompiled as new code, which undoubtedly means that the same code is compiled multiple times, actually consuming more compilation time.Moreover, if the function implementation changes, altering the function’s parameters, name, or return value also requires changing the function declaration, which is also quite cumbersome.To address these shortcomings, the C++20 standard introduced a new mechanism called modules, which solves the above three problems in one go. Unfortunately, modules seem to have some pitfalls when used alongside #include, and comprehensive support from compilers has only recently been completed. Thus, we encounter a classic C++ problem: new features are great, but no one uses them. So we still have to continue using header files.ConclusionThis series of articles has tried to objectively describe some aspects related to C++. I hope everyone has gained something.The National Day holiday is about to end, so this series will be put on hold. We will continue when the winter vacation comes or whenever we think of it.The author’s level is limited, and if there are any omissions, please feel free to criticize and correct.

Related posts

Leave a Comment Cancel reply