The Correspondence Between Assembly Language and C Language

The Root of the Dilemma in Understanding Assembly Language

For programmers who are “native” in C/C++, reading assembly code often encounters the following difficulties:

  1. Poor Readability: Assembly instructions have a low level of abstraction and lack the expressiveness of high-level languages.
  2. Lack of Context: Low-level details such as register operations and memory accesses obscure the high-level intentions of the program.
  3. Differences in Thinking Style: The structured programming mindset struggles to adapt to the unstructured flow of instructions.

The Mapping Relationship Between Assembly and C

1. Function Calls and Stack Frames

C Language Example:

int add(int a, int b) {
    return a + b;
}

Corresponding Assembly (x86):

_add:
    push    ebp        ; Save old base pointer
    mov     ebp, esp   ; Create new stack frame
    mov     eax, [ebp+8] ; Get first parameter a
    add     eax, [ebp+12] ; Add second parameter b
    pop     ebp        ; Restore old base pointer
    ret               ; Return

2. Control Structures

C Language if Statement:

if (x > 0) {
    // Code block 1
} else {
    // Code block 2
}

Corresponding Assembly:

    cmp     eax, 0     ; Compare x with 0
    jle     ELSE_BLOCK ; Jump to else block if x <= 0
    ; Instructions for code block 1
    jmp     END_IF     ; Skip else block
ELSE_BLOCK:
    ; Instructions for code block 2
END_IF:

3. Loop Structures

C Language for Loop:

for (int i = 0; i < 10; i++) {
    // Loop body
}

Corresponding Assembly:

    mov     ecx, 0     ; i = 0
FOR_LOOP:
    cmp     ecx, 10    ; Compare i with 10
    jge     END_FOR    ; If i >= 10, exit loop
    ; Instructions for loop body
    inc     ecx        ; i++
    jmp     FOR_LOOP   ; Continue loop
END_FOR:

Reverse Engineering Techniques

1. Identifying Function Prototypes

By analyzing parameter passing and return values in assembly, one can infer the C function prototype:

; Calling convention is stdcall (parameters pushed from right to left)
push    3       ; Third parameter
push    2       ; Second parameter
push    1       ; First parameter
call    _func   ; Call function
add     esp, 12 ; Clean up stack (3 parameters × 4 bytes)

; Can infer C prototype:
; void func(int a, int b, int c);

2. Struct Access Patterns

Struct access typically involves a base address plus a fixed offset:

mov     eax, [ebx+8]  ; Equivalent to C's struct_ptr->field2

3. Array Access Patterns

Array access usually includes base address, index, and element size calculation:

mov     eax, [esi+edi*4] ; Equivalent to C's array[index]
                         ; Here assuming element size is 4 bytes (int)

Practical Recommendations

  1. Start with Small Functions: Begin by analyzing simple function calls and gradually transition to more complex logic.
  2. Comment Conversion: Add C-style comments to assembly code.
  3. Bidirectional Comparison: Compile simple C code and observe the generated assembly to build an intuitive understanding.
  4. Use a Debugger: Dynamically trace register changes to understand data flow.

Complete Example of Conversion

C Code:

int factorial(int n) {
    if (n <= 1)
        return 1;
    else
        return n * factorial(n-1);
}

Corresponding Assembly:

_factorial:
    push    ebp
    mov     ebp, esp
    mov     eax, [ebp+8]   ; eax = n
    cmp     eax, 1         ; Compare n with 1
    jg      RECURSIVE_CASE ; if n > 1
    mov     eax, 1         ; return 1
    jmp     END_FACT
RECURSIVE_CASE:
    dec     eax            ; eax = n-1
    push    eax            ; Prepare parameter
    call    _factorial     ; Recursive call
    add     esp, 4         ; Clean up stack
    imul    eax, [ebp+8]   ; Return value multiplied by n
END_FACT:
    pop     ebp
    ret

Through this systematic correspondence analysis, assembly code will no longer be a pile of incomprehensible instructions, but can be understood and analyzed like reading C code.

Leave a Comment