The Root of the Dilemma in Understanding Assembly Language

For programmers who are “native” in C/C++, reading assembly code often encounters the following difficulties:

Poor Readability: Assembly instructions have a low level of abstraction and lack the expressiveness of high-level languages.
Lack of Context: Low-level details such as register operations and memory accesses obscure the high-level intentions of the program.
Differences in Thinking Style: The structured programming mindset struggles to adapt to the unstructured flow of instructions.

The Mapping Relationship Between Assembly and C

1. Function Calls and Stack Frames

C Language Example:

int add(int a, int b) {
    return a + b;
}

Corresponding Assembly (x86):

_add:
    push    ebp        ; Save old base pointer
    mov     ebp, esp   ; Create new stack frame
    mov     eax, [ebp+8] ; Get first parameter a
    add     eax, [ebp+12] ; Add second parameter b
    pop     ebp        ; Restore old base pointer
    ret               ; Return

2. Control Structures

C Language if Statement:

if (x > 0) {
    // Code block 1
} else {
    // Code block 2
}

Corresponding Assembly:

    cmp     eax, 0     ; Compare x with 0
    jle     ELSE_BLOCK ; Jump to else block if x <= 0
    ; Instructions for code block 1
    jmp     END_IF     ; Skip else block
ELSE_BLOCK:
    ; Instructions for code block 2
END_IF:

3. Loop Structures

C Language for Loop:

for (int i = 0; i < 10; i++) {
    // Loop body
}

Corresponding Assembly:

    mov     ecx, 0     ; i = 0
FOR_LOOP:
    cmp     ecx, 10    ; Compare i with 10
    jge     END_FOR    ; If i >= 10, exit loop
    ; Instructions for loop body
    inc     ecx        ; i++
    jmp     FOR_LOOP   ; Continue loop
END_FOR:

Reverse Engineering Techniques

1. Identifying Function Prototypes

By analyzing parameter passing and return values in assembly, one can infer the C function prototype:

; Calling convention is stdcall (parameters pushed from right to left)
push    3       ; Third parameter
push    2       ; Second parameter
push    1       ; First parameter
call    _func   ; Call function
add     esp, 12 ; Clean up stack (3 parameters × 4 bytes)

; Can infer C prototype:
; void func(int a, int b, int c);

2. Struct Access Patterns

Struct access typically involves a base address plus a fixed offset:

mov     eax, [ebx+8]  ; Equivalent to C's struct_ptr->field2

3. Array Access Patterns

Array access usually includes base address, index, and element size calculation:

mov     eax, [esi+edi*4] ; Equivalent to C's array[index]
                         ; Here assuming element size is 4 bytes (int)

Practical Recommendations

Start with Small Functions: Begin by analyzing simple function calls and gradually transition to more complex logic.
Comment Conversion: Add C-style comments to assembly code.
Bidirectional Comparison: Compile simple C code and observe the generated assembly to build an intuitive understanding.
Use a Debugger: Dynamically trace register changes to understand data flow.

Complete Example of Conversion

C Code:

int factorial(int n) {
    if (n <= 1)
        return 1;
    else
        return n * factorial(n-1);
}

Corresponding Assembly:

_factorial:
    push    ebp
    mov     ebp, esp
    mov     eax, [ebp+8]   ; eax = n
    cmp     eax, 1         ; Compare n with 1
    jg      RECURSIVE_CASE ; if n > 1
    mov     eax, 1         ; return 1
    jmp     END_FACT
RECURSIVE_CASE:
    dec     eax            ; eax = n-1
    push    eax            ; Prepare parameter
    call    _factorial     ; Recursive call
    add     esp, 4         ; Clean up stack
    imul    eax, [ebp+8]   ; Return value multiplied by n
END_FACT:
    pop     ebp
    ret

Through this systematic correspondence analysis, assembly code will no longer be a pile of incomprehensible instructions, but can be understood and analyzed like reading C code.

The Correspondence Between Assembly Language and C Language

The Root of the Dilemma in Understanding Assembly Language

The Mapping Relationship Between Assembly and C

1. Function Calls and Stack Frames

2. Control Structures

3. Loop Structures

Reverse Engineering Techniques

1. Identifying Function Prototypes

2. Struct Access Patterns

3. Array Access Patterns

Practical Recommendations

Complete Example of Conversion

Leave a Comment Cancel reply

The Root of the Dilemma in Understanding Assembly Language

The Mapping Relationship Between Assembly and C

1. Function Calls and Stack Frames

2. Control Structures

3. Loop Structures

Reverse Engineering Techniques

1. Identifying Function Prototypes

2. Struct Access Patterns

3. Array Access Patterns

Practical Recommendations

Complete Example of Conversion

Related posts

Leave a Comment Cancel reply