1. Conclusion
The process is divided into the following four steps.
-
Preprocessing: This step handles all directives that start with #.
-
File Inclusion: Inserts the header files specified by the
<span>#include</span>directive into the source code. -
Macro Expansion: Replaces the macros defined by
<span>#define</span>with their corresponding values. -
Conditional Compilation: Determines whether to include certain parts of the code based on conditions like
<span>#if</span>and<span>#ifdef</span>.
Compilation: The compiler converts the preprocessed code into assembly code, formatted as .s
Assembly: The assembly phase converts the assembly code (.s) into machine code (.o, a collection of binary files that machines can understand, consisting of 0s and 1s).
Linking: The process of linking different object files and library files into a single executable file.
It is normal not to understand it the first time; I didn’t understand it either. Reading through the example below will clarify things.
2. Example
Assume I have two files: main.c and add.c
// main.c
#define NUM 10
int main()
{
add(5, NUM);
return 0;
}
// add.c
int add(int a, int b)
{
return a+b;
}
This example is good for explanation; it has removed non-core content and can demonstrate the process with the simplest case and minimal code.
2.1. Preprocessing
The main.c file contains #define, performing symbol replacement, resulting in the following code:
int main() {
add(5, 10); // NUM is expanded to 10
return 0;
}
The add.c file remains unchanged.
2.2. Compilation
Typically, the gcc tool is used for compilation. Those who often program in Linux should be familiar with this shell command. Those working with microcontrollers may find it unfamiliar; just understand it as a compilation command without delving into details.
gcc -S main.c # Generates main.s assembly file
gcc -S add.c # Generates add.s assembly file
The following assembly code is obtained, which is very pure assembly.
# main.s
.file "main.c"
.text
.globl main
.type main, @function
main:
push %rbp
mov $5, %eax # 5 is passed to a
mov $10, %ebx # NUM is replaced with 10, passed to b
call add # Call add function
pop %rbp
ret
# add.s
.file "add.c"
.text
.globl add
.type add, @function
add:
push %rbp
mov %edi, %eax # Store a in eax
add %esi, %eax # Add b to eax
pop %rbp
ret
2.3. Assembly
Execute the following:
gcc -c main.s -o main.o # Compile main.s to main.o
gcc -c add.s -o add.o # Compile add.s to add.o
This compiles the .s assembly code obtained in the previous step into .o files. The contents are a series of machine codes. The following code is not to be scrutinized; it is made up, just get the idea.
# main.o
00000000 00000000 00000001 00000010 00000000 00000000 00000000 00000000 # mov eax, 5
00000000 00000000 00000001 00000010 00000000 00000000 00000000 00001010 # mov ebx, 10
00000000 00000000 00000001 00000010 00000000 00000000 00000000 10000000 # call add (address)
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 # ret
...
# add.o
00000000 00000000 00000001 00000010 00000000 00000000 00000000 01000000 # mov eax, [esp+4]
00000000 00000000 00000001 00000010 00000000 00000000 00000000 01100000 # add eax, [esp+8]
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 # ret
...
2.4. Linking
This is the final step. Link these machine codes together.
Why do we need to link main.o and add.o? If there is another abcd.o file, should it be linked?
Because main.o uses the function from add.o, they need to be linked; abcd.o is not used, so it is not linked.
gcc main.o add.o -o main # Link to generate executable file main
Below is the linked executable file, a two-person world without interference from abcd.o.
In reality, the linker performs symbol resolution and relocation, not just a simple concatenation of files. For simplicity, we can understand it as appending; we are not professionals in this area 😋
00000000 00000000 00000001 00000010 00000000 00000000 00000000 00000000 # mov eax, 5
00000000 00000000 00000001 00000010 00000000 00000000 00000000 00001010 # mov ebx, 10
00000000 00000000 00000001 00000010 00000000 00000000 00000000 00100000 # call 0x2000 (address replacement)
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 # ret
00000000 00000000 00000001 00000010 00000000 00000000 00000000 01000000 # mov eax, [esp+4]
00000000 00000000 00000001 00000010 00000000 00000000 00000000 01100000 # add eax, [esp+8]
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 # ret
...
.o files are in binary format, containing code segments, data segments, symbol tables, and other structures, not just a simple sequence of machine instructions. You can learn more about it in your work; beginners do not need to get bogged down.
To run the executable file, simply execute the following shell command:
./main