Understanding the Four Stages of C Language Compilation and Linking: Preprocessing, Compilation, Assembly, and Linking Revealed!

Hello everyone, I am Xiaokang.

Do you remember the first line of code you typed?

printf("Hello, World!\n");

You clicked “Run”, and then magically “Hello, World!” appeared on the screen.

But have you ever wondered what happens at that moment when you click “Run”? How do those characters you typed turn into instructions that the computer can execute?

Today, let’s unveil this mystery and see the thrilling journey of C language code from “source file” to “executable file”!

👨💻 Hey, friends! First, follow “Learn Programming with Xiaokang“, and then read the article, so that knowledge can form a complete system!

Introduction: The Fantastic Drift of Code

Imagine your code as a traveler preparing for a journey, starting from your editor, going through layers of checkpoints, and finally transforming into machine instructions that can run on the CPU. This process is mainly divided into four stages:

  1. Preprocessing: “Packing the luggage” for the code
  2. Compilation: “Translating” the code into assembly language
  3. Assembly: Converting assembly language into machine code
  4. Linking: “Assembling” all parts together

These four stages are interlinked and indispensable. Next, let’s look at a real example to see this process.

First Stop: Preprocessing – The “Preparation” of Code

Suppose we have a simple C program:

// main.c
#include <stdio.h>
#define MAX_SIZE 100

int sum(int a, int b);

int main() {
    int a = 5;
    int b = MAX_SIZE;
    printf("Sum is: %d\n", sum(a, b));
    return 0;
}

And an auxiliary file:

// helper.c
int sum(int a, int b) {
    return a + b;
}

The job of preprocessing is to:

  1. Expand all<span>#include</span> directives (copy the contents of header files)
  2. Replace all macro definitions (like<span>#define</span>)
  3. Process conditional compilation directives (like<span>#ifdef</span>)
  4. Remove all comments

How to see the result of preprocessing? It’s simple:

gcc -E main.c -o main.i

This command will generate<span>main.i</span> file, which is the result after preprocessing. When you open it, wow! It has transformed from a few lines of code into hundreds or even thousands of lines! Because the contents of<span>stdio.h</span> have all been copied over, and<span>MAX_SIZE</span> has been replaced with<span>100</span>.

// Part of the preprocessed main.i content (simplified)
// All contents of stdio.h...
// ...a lot of code...

# 4 "main.c"
int sum(int a, int b);

int main() {
    int a = 5;
    int b = 100;  // MAX_SIZE replaced with 100
    printf("Sum is: %d\n", sum(a, b));
    return 0;
}

So what preprocessing does is actually a “text replacement” job! It doesn’t care about whether the syntax is correct, it just faithfully executes replacements, expansions, and conditional checks—these “text operations”. Just like an assistant who doesn’t understand cooking, it will only prepare the ingredients as you say, regardless of whether those ingredients can ultimately make a dish!

Second Stop: Compilation – Translating C Language into Assembly Language

After preprocessing is complete, the compiler starts working. It will convert C code into assembly code. Assembly language is closer to machine language but still human-readable.

gcc -S main.i -o main.s

After executing this command, it will generate<span>main.s</span> file, which is the assembly code. It might look like this:

.file   "main.c"
    .section    .rodata
.LC0:
    .string "Sum is: %d\n"
    .text
    .globl  main
    .type   main, @function
main:
    pushq   %rbp
    movq    %rsp, %rbp
    subq    $16, %rsp
    movl    $5, -4(%rbp)
    movl    $100, -8(%rbp)
    movl    -8(%rbp), %edx
    movl    -4(%rbp), %eax
    movl    %edx, %esi
    movl    %eax, %edi
    call    sum
    movl    %eax, %esi
    leaq    .LC0(%rip), %rdi
    movl    $0, %eax
    call    printf@PLT
    movl    $0, %eax
    leave
    ret

Don’t understand? No worries! This is assembly language, which directly corresponds to CPU operations. Let me explain a bit:

  • <span>movl $5, -4(%rbp)</span> is equivalent to <span>a = 5</span>
  • <span>movl $100, -8(%rbp)</span> is equivalent to <span>b = 100</span>
  • <span>call sum</span> is equivalent to calling the<span>sum</span> function
  • <span>call printf@PLT</span> is equivalent to calling the<span>printf</span> function

This step is the real “translation” process, where the compiler needs to understand the meaning of your C code and then express it in assembly language. It’s like translating English into French—while the meaning is the same, the expression is completely different.

Third Stop: Assembly – Converting Assembly Code into Machine Code

Next, the assembler converts the assembly code into machine code, which is binary code made up of 0s and 1s. This process is relatively simple:

gcc -c main.s -o main.o
gcc -c helper.c -o helper.o  # Directly generate object file from helper.c

This will generate<span>main.o</span> and <span>helper.o</span>, which are the object files containing binary code that the machine can understand, but they cannot run directly yet.

If you open<span>main.o</span> with a hex editor, you will see a bunch of seemingly garbled content. On Linux, you can use<span>hexdump</span> or <span>xxd</span> commands to view:

# View using hexdump
hexdump -C main.o | head

# Or use xxd
xxd main.o | head

On Windows, you can use hex editors like HxD or 010 Editor, or use the<span>Format-Hex</span> command in PowerShell:

Format-Hex -Path main.o | Select-Object -First 10

No matter which tool you use, the content you see will look something like this:

7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00
...

This is machine language, the instructions that the CPU executes directly.

Imagine if assembly language is like sheet music, then this step is like turning the sheet music into an MP3 file that a music player can play directly. Humans find it hard to “read” it directly, but computers can immediately understand the meaning of these instructions.

Fourth Stop: Linking – Putting All Parts Together

Now we have<span>main.o</span> and <span>helper.o</span> object files, but they do not know about each other’s existence. The job of the linker is to connect them, resolve their mutual references, and add some necessary system libraries (like the<span>printf</span> function from the standard library).

gcc main.o helper.o -o my_program

After executing this command, the final executable file<span>my_program</span> will be generated. On Windows, it is usually a<span>.exe</span> file.

During the linking process, the linker will:

  1. Merge all object files into one
  2. Resolve all symbol references (like the calls to<span>sum</span> and <span>printf</span> in<span>main.o</span>)
  3. Determine the final memory addresses for each function and variable
  4. Add startup code (to initialize the environment before executing the<span>main</span> function)

This stage is like the last step of a puzzle game, putting all the scattered pieces together to form a complete image. Your code, your friend’s code, and the system library’s code are all combined at this moment to form a program that can run independently.

The Complete Process Revealed: From Source Code to Executable File

Let’s summarize the complete process:

  1. You write code: Create<span>main.c</span> and <span>helper.c</span>
  2. Preprocessing: Expand header files and macro definitions, generating<span>main.i</span> and <span>helper.i</span>
  3. Compilation: Convert the preprocessed files into assembly code, generating<span>main.s</span> and <span>helper.s</span>
  4. Assembly: Convert assembly code into machine code, generating<span>main.o</span> and <span>helper.o</span>
  5. Linking: Link object files and necessary library files into an executable file<span>my_program</span>

In practical use, usually one command completes all steps:

gcc main.c helper.c -o my_program

But behind the scenes, gcc still executes all the above steps.

Hands-On Experiment

Want to see this process with your own eyes? Try the following experiment:

  1. Create<span>main.c</span> and <span>helper.c</span> files with the content as in the example above
  2. Execute the following commands and observe the output at each step:
# Preprocessing
gcc -E main.c -o main.i

# Compile to assembly
gcc -S main.i -o main.s

# Assemble to object file
gcc -c main.s -o main.o
gcc -c helper.c -o helper.o

# Link to executable file
gcc main.o helper.o -o my_program

# Run
./my_program  # Linux/Mac
my_program.exe  # Windows

Common Errors During Compilation

Understanding the compilation and linking process will help you better understand compilation errors:

1、Preprocessing Errors: Usually due to missing header files

fatal error: stdio.h: No such file or directory

2、Compilation Errors: Syntax errors, the most common type of error

error: expected ';' before '}' token

3、Linking Errors: Unable to find the definition of a function or variable

undefined reference to 'sum'

When you see these errors, you can quickly locate the problem based on which stage it occurs!

Optimization: Making Programs Run Faster

The compiler can not only convert your code into an executable file but also help you optimize the code to make the program run faster. For example:

gcc -O3 main.c helper.c -o my_program_optimized

The<span>-O3</span> parameter tells gcc to use the highest level of optimization. The compiler will try to:

  • Inline small functions (replace function calls with function bodies)
  • Unroll loops (reduce the number of loop checks)
  • Fold constants (calculate constant expressions at compile time)
  • Eliminate dead code (remove code that will never execute)

Interesting Experiment: Peeking into the Compiler’s “Thoughts”

Try this interesting experiment to see how the compiler optimizes your code:

// test.c
#include <stdio.h>

int main() {
    int result = 0;
    for (int i = 0; i < 10; i++) {
        result += i * 2;
    }
    printf("Result: %d\n", result);
    return 0;
}

Compile and view the assembly code:

# Without optimization
gcc -S test.c -o test_no_opt.s

# With optimization
gcc -O3 -S test.c -o test_opt.s

By comparing the two files, you will find that the optimized version of the assembly code may only have one line of calculation: because the compiler discovered that the result of the entire loop is fixed (which is 90), it directly replaced it with a constant!

Final Thoughts: Why Understand This Process?

You might ask, “Do I just need to write code and click the run button?”

Understanding the compilation and linking process has these benefits:

  1. Better understanding of error messages, quickly locating problems
  2. Writing more efficient code, knowing what writing styles can lead to performance issues
  3. Solving complex dependency issues, especially in large projects
  4. Understanding platform differences, writing cross-platform code

Summary: The Four Key Stops on the Journey of Code

  1. Preprocessing Station: Organizing luggage, preparing to depart
  2. Compilation Station: Translating into intermediate language
  3. Assembly Station: Converting into machine-understandable language
  4. Linking Station: Assembling into a complete program

Next time you click the “Run” button, think about the wonderful journey your code is undergoing!

Thought Questions

  1. If you modify<span>helper.c</span> but do not modify<span>main.c</span>, which steps in the complete compilation process are necessary, and which can be skipped?
  2. What is the difference between macro definitions and ordinary functions? How are they processed during the compilation process?

Feel free to share your answers in the comments!

For the Curious You

If you are interested in further exploring the mysteries of the compilation process, try the following “magic spells”:

# View the symbol table of the object file
nm main.o

# View the segment information of the executable file
objdump -h my_program

# View dynamic library dependencies
ldd my_program  # Linux
otool -L my_program  # Mac

Each command allows you to see different aspects of the compilation and linking process, like unraveling different layers of a Rubik’s cube!

Compilation and Linking: The First Step in Exploring Code Transformation

// The evolution of a programmer
typedef enum {
    BEGINNER,      // Can write code
    INTERMEDIATE,  // Understands compilation and linking process
    ADVANCED,      // Can solve complex problems
    EXPERT         // Simplifies complex problems
} ProgrammerLevel;

// Level up function
ProgrammerLevel levelUp(ProgrammerLevel current) {
    // This requires a lot of learning and practice
    return current + 1;
}

Want to continuously improve your programming level like the code above? Follow “Learn Programming with Xiaokang“, and I will regularly share:

  • Explanations of underlying principles: In-depth yet easy-to-understand explanations of core computer concepts like today
  • Efficient development tips: C/C++ performance optimization and debugging techniques
  • Essential knowledge for advancement: Operating systems, network programming, memory management
  • Highlights from big company interviews: The pitfalls I encountered, which you can directly avoid

The journey from source code to executable file is just the beginning of a program’s existence. There are more exciting contents waiting for you—like the stories behind dynamic linking and static linking, which we will unlock together slowly!

Give a “like” and “see“, to let me know this article helped you; your support is my motivation to produce more valuable content~

How to follow my public account?

Click on the public account card below to follow.

Additionally, Xiaokang has created a technical exchange group, specifically for discussing technology and answering questions. If you encounter anything you don’t understand while reading the article, feel free to ask in the group! I will do my best to help everyone, and there are many technical experts online to support us, so we can learn and grow together!

Understanding the Four Stages of C Language Compilation and Linking: Preprocessing, Compilation, Assembly, and Linking Revealed!

Previous good articles: “Must-Know for Programmers” The pitfalls we encountered over the years: C language flexible arrays are actually super simple! Revealing the core technology of operating systems: How processes and threads were invented step by step? C language cannot define variable-length arrays? Don’t joke around! malloc(0): Not a single byte, yet still able to get memory? C language structure memory alignment: So that’s how it is! In-depth analysis of C language memory layout: From stack to heap, do you really understand? The “collectible” past and present of pointers: For all those tortured by C/C++

Leave a Comment