In-Depth Comparison of MCU Assembly Language and C Language: From Principles to Practice

1. Essential Differences: Low-Level Control vs High-Level Abstraction

1.1 Assembly Language: Direct Mapping to Hardware

Assembly language is a mnemonic representation of machine instructions, closely coupled with the MCU architecture. Taking the assembly of ARM Cortex-M as an example:

; Add registers R1 and R2, store the result in R0
ADD R0, R1, R2

; Load the value at memory address 0x20000000 into R3
LDR R3, [PC, #0x20000000] 

; Conditional branch instruction
CMP R0, #10
BLE label_less

Key Features:

  • Direct manipulation of registers (R0-R15)
  • Explicit handling of memory access (LDR/STR)
  • Manual management of program flow (B/BX/BL instructions)
  • No concept of variables, only registers and memory locations

1.2 C Language: An Interface for Hardware Abstraction

The C language generates machine code through a compiler, with a standard C code example:

// Variable operation
int sum = a + b;  

// Pointer access
uint32_t *ptr = (uint32_t*)0x20000000;
uint32_t value = *ptr;

// Control flow
if(count <= 10) {
    do_something();
}

The compiled ARM assembly may be:

ADD R0, R1, R2      ; sum = a + b
LDR R3, =0x20000000 ; Assign ptr
LDR R4, [R3]        ; value = *ptr
CMP R0, #10         ; if(count <= 10)
BGT skip_call
BL do_something
skip_call:

2. Comparison of Execution Mechanisms

2.1 Differences in Register Usage

Assembly Example (Explicit Register Allocation):

MOV R0, #5       ; int a = 5;
MOV R1, #10      ; int b = 10;
ADD R2, R0, R1   ; int c = a + b;

Equivalent C Code Implementation:

register int a = 5;   // The register keyword suggests the compiler to use registers
register int b = 10;
register int c = a + b;

Key Differences:

  • Assembly must explicitly specify registers
  • C compiler automatically handles register allocation (possibly through graph coloring algorithms)

2.2 Implementation of Function Calls

Function Calls in ARM Assembly:

; Preparation before call
PUSH {R0-R3, LR}   ; Save registers and return address
MOV R0, #5         ; First parameter
MOV R1, #10        ; Second parameter
BL my_function     ; Branch with link (save PC to LR)
POP {R0-R3, PC}    ; Restore registers and return

my_function:
    ADD R2, R0, R1 ; Function body
    BX LR          ; Return

Equivalent C Code:

int my_function(int a, int b) {
    return a + b;
}

int main() {
    int result = my_function(5, 10);
}

Stack Frame Comparison:

  • Assembly must manually manage the stack (PUSH/POP)
  • C compiler automatically generates prologue and epilogue code
  • Parameter passing conventions (ARM typically uses R0-R3 for the first four parameters)

3. Memory Access Patterns

3.1 Direct Memory Control in Assembly

LDR R0, =0x20000000  ; Load memory address
LDR R1, [R0]         ; Read memory value
STR R2, [R0, #4]     ; Write to memory (with offset)

3.2 Indirect Memory Access in C Language

volatile uint32_t *reg = (uint32_t*)0x20000000;
uint32_t value = *reg;  // Read
*(reg + 1) = 0x55AA;    // Write

Key Differences:

  • Assembly must precisely specify addresses and offsets
  • C pointer arithmetic automatically handles type sizes (+1 actually adds sizeof(type))

4. Analysis of Optimization Potential

4.1 Manual Optimization Assembly Example

Loop unrolling optimization:

; Traditional loop
MOV R0, #0        ; i = 0
loop:
CMP R0, #100
BGE done
ADD R1, R1, R0    ; sum += i
ADD R0, R0, #1    ; i++
B loop

; Optimized version unrolled 4 times
MOV R0, #0
loop:
ADD R1, R1, R0    ; i
ADD R1, R1, R0, #1 ; i+1
ADD R1, R1, R0, #2 ; i+2
ADD R1, R1, R0, #3 ; i+3
ADD R0, R0, #4
CMP R0, #100
BLT loop

4.2 C Compiler Optimization

C code with the same functionality:

for(int i=0; i<100; i++) {
    sum += i;
}

Using GCC -O3 optimization may generate:

MOV R0, #4950    ; Directly calculate result (99*100/2)

Comparison of Optimization Levels:

Optimization Type Assembly Implementation C Compiler Implementation
Loop Unrolling Manual control -funroll-loops
Constant Propagation Must be manual Automatically recognized
Dead Code Elimination Must be manual Automatically detected
Instruction Scheduling Must be manual -fschedule-insns

5. Comparison of Development Efficiency

5.1 Code Density Example

Implementing 32-bit Multiplication:

Assembly version (when ARM has no hardware multiplier):

; R0 * R1 -> R2
MOV R2, #0
mult_loop:
    TST R1, #1
    ADDNE R2, R2, R0
    LSL R0, R0, #1
    LSR R1, R1, #1
    BNE mult_loop

C version:

int product = a * b;

5.2 Maintainability Metrics

Metric Assembly Code C Code
Modify Multiplication Algorithm High risk Low risk
Port to New Architecture Complete rewrite Recompile
Team Collaboration Difficult Easy
Debugging Convenience Basic Advanced tools

6. Mixed Programming Practices

6.1 C Inline Assembly

void delay(uint32_t cycles) {
    __asm volatile (
        "1: SUBS %0, %0, #1 \n"  // Loop decrement
        "   BNE 1b \n"            // Jump if not zero
        : "+r" (cycles)           // Input-output operand
    );
}

6.2 Function-Level Mixed Calls

C calling assembly function:

// C declaration
extern int asm_add(int a, int b);

// Assembly implementation
.global asm_add
asm_add:
    ADD R0, R0, R1  ; ARM ABI specifies R0/R1 for parameter passing
    BX LR

7. Recommendations for Selection

Scenarios for Using Assembly:

  1. Startup code (e.g., Reset_Handler)
  2. Extremely performance-sensitive code segments (DSP algorithms)
  3. Need for precise timing control (μs level delays)
  4. Special instruction operations (modifying CPSR)

Scenarios for Using C Language:

  1. Application logic
  2. Protocol stack implementation
  3. Operating system development
  4. Rapid prototyping

8. Advances in Modern Compilers

Taking the STM32 HAL library as an example, comparing<span>GPIO_WritePin</span>‘s C implementation with manual assembly:

C Source Code:

void HAL_GPIO_WritePin(GPIO_TypeDef* GPIOx, uint16_t GPIO_Pin, GPIO_PinState PinState) {
    if(PinState != GPIO_PIN_RESET) {
        GPIOx->BSRR = GPIO_Pin;
    } else {
        GPIOx->BSRR = (uint32_t)GPIO_Pin << 16;
    }
}

GCC -O2 Compilation Result:

HAL_GPIO_WritePin:
    CMP R2, #0         ; Check PinState
    ITEE NE            ; Conditional execution
    STRNE R1, [R0, #24] ; BSRR register offset 24
    MOVEQ R1, R1, LSL #16
    STREQ R1, [R0, #24]
    BX LR

Manually Optimized Assembly:

HAL_GPIO_WritePin:
    CMP R2, #0
    LSLNE R1, R1, #16   ; Early shift
    STR R1, [R0, #24]   ; Unified store instruction
    BX LR

Conclusion: Modern compilers can generate code close to manual optimization, but in extreme optimization scenarios, assembly intervention is still required.

Leave a Comment