(Click the public account above to quickly follow)
Source: Ruan Yifeng’s Blog
http://www.ruanyifeng.com/blog/2018/01/assembly-language-primer.html
Learning programming is essentially learning high-level languages, which are computer languages designed for humans.
However, computers do not understand high-level languages; they must be converted into binary code by a compiler in order to run. Knowing a high-level language does not equate to understanding the actual steps a computer takes to execute code.
What computers truly understand are low-level languages, which are specifically designed to control hardware. Assembly language is a low-level language that directly describes/controls the operation of the CPU. If you want to understand what the CPU is doing and how code executes, you must learn assembly language.
Assembly language is not easy to learn; even concise introductions are hard to find. Below, I will attempt to write the most understandable assembly language tutorial, explaining how the CPU executes code.
1. What is Assembly Language?
We know that the CPU is responsible only for calculations and does not possess intelligence. When you input an instruction, it runs once, then stops and waits for the next instruction.
These instructions are all in binary, known as operation codes (opcode). For example, the addition instruction is 00000011. The role of the compiler is to translate the programs written in high-level languages into a series of operation codes.
For humans, binary programs are unreadable, making it impossible to see what the machine has done. To solve the readability issue and occasional editing needs, assembly language was created.
Assembly language is the textual form of binary instructions, with a one-to-one correspondence to the instructions. For example, the addition instruction 00000011 is written as ADD in assembly language. Once converted back to binary, assembly language can be directly executed by the CPU, making it the lowest-level low-level language.
2. Origins
In the early days, programming involved manually writing binary instructions and inputting them into the computer through various switches. For instance, to perform addition, one would press the addition switch. Later, paper tape punch machines were invented, allowing binary instructions to be automatically input into the computer by punching holes in a tape.
To resolve the readability issue of binary instructions, engineers wrote those instructions in octal. Converting binary to octal is straightforward, but octal is also not very readable. Naturally, it eventually reverted to using text, with the addition instruction written as ADD. Memory addresses were no longer directly referenced but were represented using labels.
This added an extra step, where these text instructions needed to be translated into binary; this step is called assembling, and the program that completes this step is called an assembler. The text it processes is naturally called assembly code. After standardization, it was referred to as assembly language, abbreviated as asm, and translated into Chinese as 汇编语言.
Each CPU has different machine instructions, and thus the corresponding assembly language also differs. This article introduces the most common x86 assembly language, which is used by Intel’s CPUs.
3. Registers
To learn assembly language, you must first understand two concepts: registers and memory models.
Let’s start with registers. The CPU is responsible solely for computations, not for storing data. Data is generally stored in memory, and when the CPU needs it, it reads and writes data from memory. However, the CPU’s computation speed is much higher than the read/write speed of memory; to avoid being slowed down, CPUs come with Level 1 and Level 2 caches. Essentially, CPU caches can be seen as faster memory.
However, CPU caches are still not fast enough, and since the addresses of data in the caches are not fixed, the CPU has to address them every time it reads/writes, which can also slow it down. Therefore, in addition to caches, CPUs also come with registers (registers) to store the most frequently used data. This means that the most frequently read/written data (like loop variables) will be placed in registers, allowing the CPU to prioritize reading/writing from registers before exchanging data with memory.
Registers do not rely on addresses to distinguish data but on names. Each register has its own name, and we tell the CPU which specific register to retrieve data from, making this the fastest method. Some compare registers to the CPU’s zero-level cache.
4. Types of Registers
Early x86 CPUs had only 8 registers, each with different purposes. Modern registers have over 100 and have become general-purpose registers, no longer specifying particular uses, but the names of early registers have been preserved.
-
EAX
-
EBX
-
ECX
-
EDX
-
EDI
-
ESI
-
EBP
-
ESP
Of the eight registers listed above, the first seven are general-purpose. The ESP register has a specific purpose of holding the address of the current Stack (see the next section).
We often see names like 32-bit CPU and 64-bit CPU; these refer to the size of the registers. A 32-bit CPU has registers that are 4 bytes in size.
5. Memory Model: Heap
Registers can only hold a small amount of data; most of the time, the CPU has to instruct registers to exchange data directly with memory. Therefore, in addition to registers, it is essential to understand how memory stores data.
When a program runs, the operating system allocates a segment of memory to store the program and the data generated during execution. This memory has a starting address and an ending address, for example, from 0x1000 to 0x8000, where the starting address is the smaller one, and the ending address is the larger one.
During the program’s execution, for dynamic memory allocation requests (such as creating new objects or using the malloc command), the system will allocate a portion from the pre-allocated memory to the user, with specific rules starting from the starting address (in reality, there will be some static data at the starting address, which we will ignore). For example, if a user requests 10 bytes of memory, it will be allocated starting from address 0x1000 up to address 0x100A; if another request for 22 bytes is made, it will be allocated up to 0x1020.
This memory area, allocated due to user requests, is called the Heap. It grows from the starting address upwards (in terms of address). One important characteristic of the Heap is that it does not automatically disappear; it must be manually released or reclaimed by a garbage collection mechanism.
6. Memory Model: Stack
Besides the Heap, other memory allocations are called the Stack. Simply put, the Stack is a memory area temporarily occupied due to function execution.
Consider the following example.
int main() {
int a = 2;
int b = 3;
}
In the code above, when the system starts executing the main function, it creates a frame in memory for it, and all internal variables of main (like a and b) are stored in this frame. Once the main function finishes executing, this frame will be reclaimed, releasing all internal variables and no longer occupying space.
If a function calls another function, what happens?
int main() {
int a = 2;
int b = 3;
return add_a_and_b(a, b);
}
In the code above, the main function calls the add_a_and_b function. When executing this line, the system will create a new frame for add_a_and_b to store its internal variables. This means that at this point, two frames exist simultaneously: main and add_a_and_b. Generally, the number of frames corresponds to the number of layers in the call stack.
When add_a_and_b finishes executing, its frame will be reclaimed, and the system will return to the point in the main function where execution was interrupted, continuing from there. This mechanism allows for layered function calls, with each layer able to use its own local variables.
All frames are stored in the Stack, and since frames are layered, the Stack is called a stack. Creating a new frame is called “pushing” onto the stack; in English, it is called push. Reclaiming from the stack is called “popping”; in English, it is called pop. The characteristic of the Stack is that the last frame pushed is the first to be popped (because the innermost function call ends first), which is known as a “last in, first out” data structure. Each time a function finishes executing, one frame is automatically released, and when all functions finish executing, the entire Stack is released.
The Stack is allocated from the end address of the memory area, growing downwards (in terms of address). For example, if the end address of the memory area is 0x8000, and the first frame assumes to be 16 bytes, the next allocated address will start from 0x7FF0; if the second frame requires 64 bytes, the address will move to 0x7FB0.
7. CPU Instructions
7.1 An Example
After understanding registers and memory models, we can look at what assembly language really is. Below is a simple program example.c.
int add_a_and_b(int a, int b) {
return a + b;
}
int main() {
return add_a_and_b(2, 3);
}
gcc will convert this program into assembly language.
$ gcc -S example.c
After executing the above command, a text file example.s will be generated, containing the assembly language with dozens of lines of instructions. To put it simply, a simple operation in a high-level language may consist of several, or even dozens of CPU instructions at the low level. The CPU executes these instructions sequentially to complete this operation.
<span>example.s</span>
after simplification looks approximately like this.
_add_a_and_b:
push %ebx
mov %eax, [%esp+8]
mov %ebx, [%esp+12]
add %eax, %ebx
pop %ebx
ret
_main:
push 3
push 2
call _add_a_and_b
add %esp, 8
ret
We can see that the two functions of the original program, <span>add_a_and_b</span>
and <span>main</span>
, correspond to two labels <span>_add_a_and_b</span>
and <span>_main</span>
. Each label contains the CPU execution flow for that function.
Each line is an operation executed by the CPU. It can be divided into two parts; let’s take one line as an example.
push %ebx
In this line, <span>push</span>
is the CPU instruction, and <span>%ebx</span>
is the operand that the instruction needs. A CPU instruction can have zero to multiple operands.
Next, I will explain this assembly program line by line. It is recommended that readers copy this program into another window to avoid scrolling back up while reading.
7.2 Push Instruction
By convention, the program starts executing from the _main label, at which point a frame for main will be created on the Stack, and the address pointed to by the Stack will be written into the ESP register. If data needs to be written into the main frame later, it will be written to the address stored in the ESP register.
Then, it starts executing the first line of code.
push 3
<span>push</span>
instruction is used to place an operand onto the Stack, here writing <span>3</span>
into the main frame.
Although it looks simple, the <span>push</span>
instruction actually has a preceding operation. It first retrieves the address in the ESP register, subtracts 4 bytes, and then writes the new address into the ESP register. The subtraction is because the Stack grows downwards, and 4 bytes is due to the type of <span>3</span>
, which is <span>int</span>
, occupying 4 bytes. After obtaining the new address, 3 will be written to the starting four bytes of this address.
push 2
The second line is the same; the <span>push</span>
instruction writes <span>2</span>
into the main frame, right next to the previously written <span>3</span>
. At this point, the ESP register will again subtract 4 bytes (cumulatively subtracting 8).
7.3 Call Instruction
The third line’s <span>call</span>
instruction is used to call a function.
call _add_a_and_b
The code above indicates calling the <span>add_a_and_b</span>
function. At this point, the program will look for the <span>_add_a_and_b</span>
label and create a new frame for that function.
Then it starts executing the code of <span>_add_a_and_b</span>
.
push %ebx
This line indicates that the value in the EBX register is written into the <span>_add_a_and_b</span>
frame. This is because this register will be used later, so the value is retrieved first and written back after use.
At this point, the <span>push</span>
instruction will again subtract 4 bytes from the address in the ESP register (cumulatively subtracting 12).
7.4 Mov Instruction
<span>mov</span>
instruction is used to write a value into a specific register.
mov %eax, [%esp+8]
This line of code indicates that the address in the ESP register is increased by 8 bytes to get a new address, and then data is retrieved from the Stack according to this address. Based on the previous steps, it can be inferred that the data retrieved here is <span>2</span>
, which is then written to the EAX register.
The next line of code does the same thing.
mov %ebx, [%esp+12]
The above code retrieves data from the Stack according to the address in the ESP register increased by 12 bytes, this time retrieving <span>3</span>
and writing it to the EBX register.
7.5 Add Instruction
<span>add</span>
instruction is used to add two operands and write the result into the first operand.
add %eax, %ebx
The above code adds the value in the EAX register (which is 2) to the value in the EBX register (which is 3), getting the result 5, which is then written back into the first operand EAX register.
7.6 Pop Instruction
<span>pop</span>
instruction is used to retrieve the most recently written value from the Stack (i.e., the value at the lowest address) and write this value into the specified operand.
pop %ebx
The above code indicates retrieving the most recently written value from the Stack (i.e., the original value of the EBX register) and writing this value back into the EBX register (since the addition has been completed and the EBX register is no longer needed).
Note that the <span>pop</span>
instruction will also add 4 bytes to the address in the ESP register, reclaiming 4 bytes.
7.7 Ret Instruction
<span>ret</span>
instruction is used to terminate the execution of the current function and return control to the calling function. That is, the current function’s frame will be reclaimed.
ret
It can be seen that this instruction has no operands.
As the <span>add_a_and_b</span>
function finishes execution, the system returns to the point in the main function where execution was interrupted, continuing from there.
add %esp, 8
The above code indicates manually adding 8 bytes to the address in the ESP register and writing it back to the ESP register. This is because the ESP register points to the start address of the Stack; the previous <span>pop</span>
operation has already reclaimed 4 bytes, and here we reclaim another 8 bytes, equating to a full reclaim.
ret
Finally, the <span>main</span>
function ends its execution, and the <span>ret</span>
instruction exits the program execution.
8. Reference Links
http://kakaroto.homelinux.net/2017/11/introduction-to-reverse-engineering-and-assembly/
http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
If you found this article helpful, please share it with more people.
Follow “CPP Developers” to enhance your C/C++ skills.