Understanding Programs Through Assembly Language

1. C Language, Assembly Language, Machine Language

1.1 C Language Code

The C language is a high-level language, which cannot be directly executed by the computer and needs to be translated into machine language to be recognized and run by the computer.

However, machine language consists of binary numbers, and to facilitate reading, we use identifiers to represent machine language. These identifiers are known as assembly language, which corresponds one-to-one with machine language.

Below is a simple C program that calls the add method to perform the addition of 10 and 20. test.c

#include <stdio.h>

int add(int x,int y){
    return x + y;
}

int main()
{
    int res = add(10,20);
    return 0;
}

1.2 Assembly Code

The following command can be used to obtain the assembly language compiled from the C program:

gcc -S test.c -o test.s
add:
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        mov     edx, DWORD PTR [rbp-4]
        mov     eax, DWORD PTR [rbp-8]
        add     eax, edx
        pop     rbp
        ret
main:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     esi, 20
        mov     edi, 10
        call    add
        mov     DWORD PTR [rbp-4], eax
        mov     eax, 0
        leave
        ret
  • For ease of reading, many unnecessary comments have been removed here.

1.3 Machine Language

The corresponding machine language for the assembly language is as follows:

55
48 89 e5
89 7d fc
89 75 f8
8b 55 fc
8b 45 f8
01 d0
5d
c3

55
48 89 e5
48 83 ec 10
be 14 00 00 00
bf 0a 00 00 00
e8 d5 ff ff ff
89 45 fc
b8 00 00 00 00
c9
c3
  • Each byte is represented by two hexadecimal digits.
  • Machine language corresponds one-to-one with assembly language.

2. Assembly Language Opcodes

Common assembly language opcodes:

Opcode Operand Function
mov A,B Assign the value of B to A
add A,B Add the values of A and B, and assign the result to A
push A Store the value of A in the stack
pop A Read the value from the stack and assign it to A
call A Call function A
ret None Return control to the function caller

A program is a collection of instructions and data, which is parsed and executed by the CPU in memory.

The objects operated on by the CPU are the registers.

3. Types of Registers

Register Full Name Name Function
eax Accumulator Register Accumulator Register Computation
ebx Base Register Base Register Store memory address
ecx Counter Register Counter Register Count loop iterations
esi Source Index Register Source Base Register Store memory address of data source
edi Destination Index Register Destination Base Register Store memory address of data destination
rbp Register Base Pointer Base Pointer Register Store memory address of data storage base
rsp Register Stack Pointer Stack Pointer Register Store memory address of top data in the stack
ebp Extended Base Pointer Register Extended Base Pointer Register Store memory address of data storage base
esp Extended Stack Pointer Register Extended Stack Pointer Register Store memory address of top data in the stack

4. Analysis of Assembly Code

4.1 One Line of C Code Corresponds to Multiple Lines of Assembly Code

Understanding Programs Through Assembly Language
  • Observe the color correspondence.

First, take a look at the assembly code as a whole to get a general impression.

add:             ; Define the function named add
        push    rbp       ; Save the value of rbp register to the stack
        mov     rbp, rsp   ; Set rbp to the value of the current stack top pointer rsp
        mov     DWORD PTR [rbp-4], edi   ; Store the value of the first parameter edi in the 4-byte memory at rbp-4
        mov     DWORD PTR [rbp-8], esi   ; Store the value of the second parameter esi in the 4-byte memory at rbp-8
        mov     edx, DWORD PTR [rbp-4]  ; Read the value from rbp-4 memory into edx register
        mov     eax, DWORD PTR [rbp-8]  ; Read the value from rbp-8 memory into eax register
        add     eax, edx   ; Add the values of eax and edx, storing the result in eax register
        pop     rbp       ; Restore the previously saved value of rbp register
        ret                  ; Return the value in eax register
main:           ; Define the function named main
        push    rbp       ; Save the value of rbp register to the stack
        mov     rbp, rsp  ; Set rbp to the value of the current stack top pointer rsp
        sub     rsp, 16    ; Move down 16 bytes of space at the top of the stack
        mov     esi, 20    ; Assign 20 to esi register as the second parameter
        mov     edi, 10    ; Assign 10 to edi register as the first parameter
        call    add        ; Call the add function
        mov     DWORD PTR [rbp-4], eax  ; Store the value returned from the add function in the 4-byte memory at rbp-4
        mov     eax, 0    ; Assign 0 to eax register as the return value
        leave             ; Restore the stack top pointer rsp, equivalent to mov rsp, rbp; pop rbp
        ret                ; Return the value in eax register
  • It can be seen that the instructions are all operations on registers.

4.2 Stack Memory

When a program runs, it allocates a memory space known as the stack. The characteristic of the stack is that it follows the Last In First Out (LIFO) principle.

The stack is like a pile of plates, where you take them from the top one by one.

Understanding Programs Through Assembly Language

4.3 Stack Frame Allocation

During function calls, a stack frame is created in the stack memory, which stores the information required for the function call, including function parameters, local variables, return addresses, and other context information. When the function call ends, this stack frame is deleted and the corresponding space in the stack memory is released.

Both the main function main and the add function use stack frame allocation. Let’s calculate the size of each stack frame allocation:

In the main function, a stack frame is allocated, and the specific steps are as follows:

  1. push rbp pushes the caller’s base pointer rbp onto the stack, occupying 8 bytes.
  2. mov rbp, rsp assigns the value of the stack top pointer rsp to rbp, effectively making rbp point to the base of the current stack frame, which does not occupy additional stack space.
  3. sub rsp, 16 allocates 16 bytes of space for local variables and parameters of the current stack frame, expecting 4 bytes to save the return value from the add function eax, thus leaving 12 bytes of space for local variables.
  4. Other stack operations do not involve stack frame space allocation and do not occupy additional stack space.

In summary, the stack frame allocation size for the main function is 8 bytes (push rbp) + 16 bytes (sub rsp, 16) = 24 bytes.

In the add function, a stack frame is also allocated, and the specific steps are as follows:

  1. push rbp pushes the caller’s main function’s base pointer rbp onto the stack, occupying 8 bytes.
  2. mov rbp, rsp assigns the value of the stack top pointer rsp to rbp, effectively making rbp point to the base of the current stack frame, which does not occupy additional stack space.
  3. mov DWORD PTR [rbp-4], edi stores the first parameter edi of the main function in the current stack frame at [rbp-4], occupying 4 bytes.
  4. mov DWORD PTR [rbp-8], esi stores the second parameter esi of the main function in the current stack frame at [rbp-8], occupying 4 bytes.
  5. Other stack operations do not involve stack frame space allocation and do not occupy additional stack space.

In summary, the stack frame allocation size for the add function is 8 bytes (push rbp) + 4 bytes (mov DWORD PTR [rbp-4], edi) + 4 bytes (mov DWORD PTR [rbp-8], esi) = 16 bytes.

4.4 Analysis of Assembly Code

  1. The main function first pushes the current base pointer rbp onto the stack and stores the value of rbp at the top of the stack, executing push rbp and mov rbp, rsp.

  2. Next, the main function allocates a 16-byte area at the top of the current stack frame, executing sub rsp, 16.

  3. Then, the main function stores the first parameter 20 and the second parameter 10 in the esi and edi registers, respectively.

  4. The main function then calls the add function by executing call add. During the execution of the call instruction, the address of the next instruction (the entry address of the add function) is pushed onto the stack, while the value of rsp is decreased by 8 to establish a new stack frame for the subroutine call.

  5. When the add function executes, it creates a new stack frame at the top of the stack. First, the add function pushes the current base pointer rbp onto the stack and stores its value at the top of the stack, executing push rbp and mov rbp, rsp.

  6. Next, the add function stores the first parameter edi and the second parameter esi in [rbp-4] and [rbp-8], executing mov DWORD PTR [rbp-4], edi and mov DWORD PTR [rbp-8], esi.

  7. The add function then moves the second parameter esi and the first parameter edi into the eax and edx registers, respectively.

  8. Then, the add function executes add eax, edx, adding these two numbers and storing the result in the eax register.

  9. Next, the add function executes pop rbp, popping the current base pointer from the stack and storing its value in rbp. Meanwhile, the add function returns to the instruction following the call instruction by executing ret, with the result stored in the eax register.

  10. Control returns to the main function, where the return value from the add function is copied from the eax register to [rbp-4] by executing mov DWORD PTR [rbp-4], eax.

  11. Then, the main function resets the eax register to 0 to make it the return value of the function by executing mov eax, 0.

  12. The main function then executes leave, which is equivalent to mov rsp, rbp, followed by pop rbp, to restore the state of the stack frame.

  13. Finally, the main function executes ret, returning the program from the main function to the place that called it, with the return value stored in the eax register.

4.5 Passing Parameters via Registers and Stack

Let’s analyze the assembly code above. It can be seen that function calls use “registers” and “stack” to pass parameters. Specifically:

  1. In the main function, the parameter values are stored in the esi and edi registers. mov esi, 20 stores the integer value 20 in the esi register, and mov edi, 10 stores the integer value 10 in the edi register.

  2. Next, in the main function, control jumps to the add function by executing call add, transferring control to the add function for execution.

  3. In the add function, first, push rbp pushes the caller’s base pointer rbp onto the stack, saving the base pointer before the add function. Then, mov DWORD PTR [rbp-4], edi stores the first parameter 10 at position [rbp-4], and mov DWORD PTR [rbp-8], esi stores the second parameter 20 at position [rbp-8]. This achieves parameter passing by storing the parameters in the current function’s stack frame.

  4. In the add function, the values of the two parameters stored in the stack frame are read back into the edx and eax registers using mov edx, DWORD PTR [rbp-4] and mov eax, DWORD PTR [rbp-8].

  5. Next, the instruction add eax, edx adds the values in eax and edx, storing the result in the eax register, which serves as the return value of the function.

  6. Finally, in the add function, pop rbp pops the previously pushed base pointer rbp from the stack to restore the caller’s stack frame. The function then returns to the caller’s position using the ret instruction.

  7. After control returns to the main function, the return value from the add function is stored at position [rbp-4] using mov DWORD PTR [rbp-4], eax. The eax register is then reset to 0 using mov eax, 0, and the current stack frame is cleared with the leave instruction, restoring the caller’s stack frame. Finally, control returns to the caller using ret.

In summary, function parameters are passed using the registers esi and edi, and by storing parameters in the current function’s stack frame, the called function can retrieve the parameter values.

5. Conclusion

This article uses a simple C language code example to elaborate on the process and related knowledge of assembly code.

Specifically, we vividly present the corresponding assembly code by analyzing this C language code. In the assembly code, we explained a series of key concepts such as registers, assembly instruction opcodes, memory addresses, etc.

Additionally, we delved into the principles of function calls, including the concepts of stack memory and stack frames, and data related to stack frames (such as return addresses, function parameters, local variables, etc.). By understanding these contents, readers can gain a fuller understanding of how programs store and process data during execution and how the entire program flow is constructed through continuous function calls.

Leave a Comment