The Four-Step Compilation Process of C Language Explained with Two Files

1. Conclusion

The process is divided into the following four steps.

  1. Preprocessing: This step handles all directives that start with #.

  • File Inclusion: Inserts the header files specified by the <span>#include</span> directive into the source code.

  • Macro Expansion: Replaces the macros defined by <span>#define</span> with their corresponding values.

  • Conditional Compilation: Determines whether to include certain parts of the code based on conditions like <span>#if</span> and <span>#ifdef</span>.

  • Compilation: The compiler converts the preprocessed code into assembly code, formatted as .s

  • Assembly: The assembly phase converts the assembly code (.s) into machine code (.o, a collection of binary files that machines can understand, consisting of 0s and 1s).

  • Linking: The process of linking different object files and library files into a single executable file.

  • It is normal not to understand it the first time; I didn’t understand it either. Reading through the example below will clarify things.

    2. Example

    Assume I have two files: main.c and add.c

    // main.c
    
    #define NUM 10
    
    int main()
    {
        add(5, NUM);
        return 0;
    }
    
    // add.c
    
    int add(int a, int b)
    {
     return a+b;
    }
    

    This example is good for explanation; it has removed non-core content and can demonstrate the process with the simplest case and minimal code.

    2.1. Preprocessing

    The main.c file contains #define, performing symbol replacement, resulting in the following code:

    int main() {
        add(5, 10);  // NUM is expanded to 10
        return 0;
    }
    

    The add.c file remains unchanged.

    2.2. Compilation

    Typically, the gcc tool is used for compilation. Those who often program in Linux should be familiar with this shell command. Those working with microcontrollers may find it unfamiliar; just understand it as a compilation command without delving into details.

    gcc -S main.c  # Generates main.s assembly file
    gcc -S add.c   # Generates add.s assembly file
    

    The following assembly code is obtained, which is very pure assembly.

    # main.s
    
        .file   "main.c"
        .text
        .globl  main
        .type   main, @function
    main:
        push    %rbp
        mov     $5, %eax        # 5 is passed to a
        mov     $10, %ebx       # NUM is replaced with 10, passed to b
        call    add             # Call add function
        pop     %rbp
        ret
    
    # add.s
    
        .file   "add.c"
        .text
        .globl  add
        .type   add, @function
    add:
        push    %rbp
        mov     %edi, %eax      # Store a in eax
        add     %esi, %eax      # Add b to eax
        pop     %rbp
        ret
    

    2.3. Assembly

    Execute the following:

    gcc -c main.s -o main.o  # Compile main.s to main.o
    gcc -c add.s -o add.o    # Compile add.s to add.o
    

    This compiles the .s assembly code obtained in the previous step into .o files. The contents are a series of machine codes. The following code is not to be scrutinized; it is made up, just get the idea.

    # main.o
    
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 00000000  # mov eax, 5
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 00001010  # mov ebx, 10
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 10000000  # call add (address)
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  # ret
    ...
    
    # add.o
    
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 01000000  # mov eax, [esp+4]
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 01100000  # add eax, [esp+8]
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  # ret
    ...
    

    2.4. Linking

    This is the final step. Link these machine codes together.

    Why do we need to link main.o and add.o? If there is another abcd.o file, should it be linked?

    Because main.o uses the function from add.o, they need to be linked; abcd.o is not used, so it is not linked.

    gcc main.o add.o -o main  # Link to generate executable file main
    

    Below is the linked executable file, a two-person world without interference from abcd.o.

    In reality, the linker performs symbol resolution and relocation, not just a simple concatenation of files. For simplicity, we can understand it as appending; we are not professionals in this area 😋

    00000000 00000000 00000001 00000010 00000000 00000000 00000000 00000000  # mov eax, 5
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 00001010  # mov ebx, 10
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 00100000  # call 0x2000 (address replacement)
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  # ret
    
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 01000000  # mov eax, [esp+4]
    00000000 00000000 00000001 00000010 00000000 00000000 00000000 01100000  # add eax, [esp+8]
    00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000  # ret
    ...
    

    .o files are in binary format, containing code segments, data segments, symbol tables, and other structures, not just a simple sequence of machine instructions. You can learn more about it in your work; beginners do not need to get bogged down.

    To run the executable file, simply execute the following shell command:

    ./main
    

    Leave a Comment