The Cortex-M3/M4 is the most mainstream ARM core in embedded development, widely used in microcontrollers such as STM32. Its assembly language follows the RISC architecture’s “reduced and efficient” design philosophy. This article explains the core logic of ARM assembly step by step, from core registers to common instructions, and practical examples.
1. Core Registers (“Operation Objects” in Assembly)
The registers of Cortex-M3/M4 are the direct operation objects of assembly instructions. There is no need to memorize all registers; mastering the following high-frequency core registers is sufficient to cover the vast majority of embedded development scenarios.
1. General Purpose Registers
| Register | Alias / Function | Core Usage |
|---|---|---|
| R0-R7 | Low Registers | Temporary storage of operation data, passing the first four parameters of functions, storing memory addresses (in conjunction with LDR/STR to access RAM/ROM) |
| R13 | SP (Stack Pointer) | Points to the top of the stack, managing context saving and restoration during function calls. Cortex-M defaults to a full descending stack model. |
| R14 | LR (Link Register) | Saves the return address after a function call (automatically stored by the <span>BL</span> instruction). |
| R15 | PC (Program Counter) | Points to the address of the currently executing instruction; modifying the PC allows for jumps. Cortex-M is always in Thumb state, so BIT0 of the PC must be 1. |
2. Program Status Register
The Program Status Register (xPSR) is composed of three parts: APSR (Application Status Register), IPSR (Interrupt Program Status Register), and EPSR (Execution Program Status Register). Among them, APSR is the key focus, as its four flags are the core basis for “conditional instructions”:
-
N (Negative Flag): N=1 when the operation result is negative (highest bit is 1), otherwise N=0; used to determine the sign of signed numbers.
-
Z (Zero Flag): Z=1 when the operation result is 0, otherwise Z=0; commonly used for loop termination checks (e.g., counting to 0), equality comparisons (e.g., after
<span>CMP R0, R1</span>, use<span>BEQ</span>to jump). -
C (Carry/Borrow Flag): C=1 when there is a carry in addition or no borrow in subtraction, otherwise C=0; used for multi-byte operations (e.g., 64-bit data addition).
-
V (Overflow Flag): V=1 when signed number operations exceed the 32-bit range, otherwise V=0; used to detect errors in signed number operations (e.g.,
<span>0x7FFFFFFF + 1</span>will overflow).
2. Common ARM Assembly Instructions
The Cortex-M3/M4 architecture follows the Load-Store principle, meaning that data processing instructions only operate on registers, and data exchange with memory must be completed through <span>LDR</span> (load data from memory to register) and <span>STR</span> (store register data to memory) instructions.
1. Data Processing Instructions (Operations on CPU Internal Registers)
These instructions only operate on general-purpose registers and do not directly access memory, serving as the core for implementing logical operations such as “addition, subtraction, comparison, and bit manipulation”.
(1) MOV: Data Transfer Instruction
Core Function: Implements data transfer between registers or loading an immediate value into a register.
Syntax: <span>MOV{S}{cond} Rd, Op2</span>
-
<span>{S}</span>: Optional, updates APSR flags after instruction execution (e.g.,<span>MOVS R0, #0</span>sets Z=1). -
<span>{cond}</span>: Optional, conditional execution suffix (e.g.,<span>MOVNE R0, #0xFF</span>means “execute if Z=0 (previous operation result is not 0)”). -
<span>Op2</span>: Can be an immediate value or another register, and may include shift operations (e.g.,<span>R3, LSL #2</span>).
Example:
MOV R0, #0x20000000 ; R0 = 0x20000000 (load RAM base address)
MOVS R1, #0 ; R1 = 0, while updating APSR's Z flag (Z=1)
MOV R2, R3, LSL #2 ; R2 = R3 << 2 (store the value of R3 left-shifted by 2 into R2)
(2) ADD/SUB: Addition and Subtraction Instructions
Core Function: Implements addition and subtraction operations on registers or immediate values.
Syntax:
-
ADD (Addition):
<span>ADD{S}{cond} Rd, Rn, Op2</span>(<span>Rd = Rn + Op2</span>) -
SUB (Subtraction):
<span>SUB{S}{cond} Rd, Rn, Op2</span>(<span>Rd = Rn - Op2</span>)
Example:
; ADD Example
ADD R0, R1, #5 ; R0 = R1 + 5
ADDS R2, R3, R4 ; R2 = R3 + R4, while updating APSR
; SUB Example
SUB R0, R1, #10 ; R0 = R1 - 10
SUBS R5, R5, #1 ; R5 decrement by 1 (loop counter), and update flags
(3) CMP: Comparison Instruction
Core Function: Implicitly computes “Rn – Op2”, does not save the result, only updates APSR flags, preparing for subsequent “conditional jumps”.
Syntax: <span>CMP{cond} Rn, Op2</span>
Example:
CMP R0, #100 ; Compare R0 with 100
BEQ LoopEnd ; If Z=1 (R0=100), jump to LoopEnd
BNE LoopContinue ; If Z=0 (R0≠100), jump to LoopContinue
2. Memory Access Instructions
(1) LDR: Load Memory Instruction
Core Function: Reads data from memory into a register.
Syntax: <span>LDR{type}{cond} Rd, [Rn {, #offset}]</span>
-
<span>{type}</span>: Optional, specifies data type (<span>B</span>= unsigned byte,<span>H</span>= unsigned halfword, default = word). -
Addressing Mode: Offset Addressing (
<span>[Rn, #4]</span>): Address = Rn + 4, Rn remains unchanged. Pre-indexing (<span>[Rn, #4]!</span>): Address = Rn + 4, then update Rn = Rn + 4. Post-indexing (<span>[Rn], #4</span>): Address = Rn, then update Rn = Rn + 4.
Example:
LDR R0, [R1] ; Read 4-byte data pointed by R1 into R0
LDRB R2, [R1, #1] ; Read 1-byte data from address R1+1 into R2
LDRH R3, [R1], #2 ; Read 2-byte data pointed by R1 into R3, then R1 = R1 + 2
(2) STR: Store Memory Instruction
Core Function: Writes data from a register into memory.
Syntax: Same as <span>LDR</span> (<span>Rd</span> is the source register).
Example:
STR R0, [R1] ; Write 4-byte data from R0 to the address pointed by R1
STRH R2, [R1, #4]! ; Write 2-byte data from R2 to R1+4, then R1 = R1 + 4
(3) PUSH/POP: Stack Operation Instructions
Core Function: Batch save/restore registers to/from the stack, which is the standard and recommended way to protect context during function calls. They are aliases for <span>STMFD SP!</span> and <span>LDMFD SP!</span>, specifically for stack operations, making them more concise and intuitive.
Syntax:
-
Push to Stack (Save Registers):
<span>PUSH {reglist}</span> -
Pop from Stack (Restore Registers):
<span>POP {reglist}</span>
Example:
; Function Entry: Save R4-R6 (registers to be protected) and LR (return address)
PUSH {R4-R6, LR} ; Push to stack, SP decreases accordingly
; Function Body ... (R4-R6 can be safely used)
; Function Exit: Restore registers and return
POP {R4-R6, PC} ; Pop from stack, restore R4-R6, and pop LR directly to PC (achieving return)
3. Jump and Function Call Instructions (Program Flow Control)
(1) B: Unconditional/Conditional Jump
Core Function: Directly modifies the PC value to jump to a specified label, suitable for “loops and branch judgments”.
Syntax: <span>B{cond} Label</span>
Example:
B MainLoop ; Unconditional jump to MainLoop
CMP R0, #0
BNE ErrorHandler ; If R0≠0, jump to ErrorHandler
(2) BL: Function Call Instruction
Core Function: Automatically stores the return address (address of the next instruction) into LR before jumping, used for function calls.
Syntax: <span>BL{cond} Label</span>
Example:
BL Delay ; Call Delay function, LR = return address
MOV R1, #1 ; After Delay returns, continue execution from here
Delay:
MOV R0, #100000
DelayLoop:
SUBS R0, R0, #1
BNE DelayLoop
BX LR ; Use BX LR to return to the caller
(3) Pseudo Instructions
LDR =val: Loads any 32-bit value into a register.
LDR R0, =0x12345678 ; Load non-immediate value
LDR R1, =0x10 ; The compiler may optimize this to MOV R1, #0x10
ADR: Gets the relative address of a label (short distance).
ADR R0, DataBuf ; Load the address of DataBuf into R0
DataBuf DCD 0x00, 0x01, 0x02
3. Complete Example Program
The following example covers four core scenarios: “stack operations, function calls, memory read/write, and data verification”, using the recommended <span>PUSH</span>/<span>POP</span> instructions.
; Program Description: Cortex-M3/M4 Assembly Practical Example
; Core Functions: 1. Stack save registers 2. Call delay function 3. Read/write RAM data 4. Verify data consistency 5. Loop execution
AREA ARM_Demo, CODE, READONLY
ENTRY
THUMB ; Explicitly specify using Thumb instruction set
ALIGN 4
; --------------------------
; Main Function: Program Core Logic Entry
; --------------------------
Main
; 1. Stack Operations: Save registers and return address that may be used at function entry, following calling conventions
PUSH {R0-R2, LR} ; Use PUSH to save registers
; 2. Memory Read/Write: Write data to RAM address (0x20000000), then read and verify
MOV R0, #0x20000000 ; R0 = RAM base address
LDR R1, =0x12345678 ; R1 = Data to be written
STR R1, [R0] ; Write operation: Write data to memory
LDR R2, [R0] ; Read operation: Read data from memory
; 3. Data Verification: Compare "written value (R1)" with "read value (R2)"
CMP R1, R2
BEQ Data_OK ; If data is consistent, jump
MOV R3, #0x00 ; Data inconsistent: R3 = 0x00 (error flag)
B Call_Delay
Data_OK
MOV R3, #0xFF ; Data consistent: R3 = 0xFF (success flag)
; 4. Call Delay Function
Call_Delay
BL Delay_Func ; Call delay function
; 5. Restore registers and return: Restore R0-R2 from stack, and return to caller by popping LR to PC, achieving loop
POP {R0-R2, PC} ; Use POP to restore registers and return
; --------------------------
; Delay Function: Simple decrement delay
; --------------------------
Delay_Func
PUSH {R0, LR} ; Delay function also protects its used registers and LR
LDR R0, =500000
Delay_Loop
SUBS R0, R0, #1
BNE Delay_Loop
POP {R0, PC} ; Restore R0, and return by popping LR to PC
ALIGN 4
END
Execution Effects:
- After pushing to stack: SP decreases accordingly.
-
After writing to memory: Check address
<span>0x20000000</span>, value is<span>0x12345678</span>. -
After verification: APSR’s Z flag is 1, R3 is set to
<span>0xFF</span>. -
Running at full speed: The program loops between
<span>Main</span>and<span>Delay_Func</span>, R3 always remains<span>0xFF</span>.
4. Conclusion
The core logic of Cortex-M3/M4 assembly can be distilled into three sentences:
-
The operation objects are registers: Core registers only need to master R0-R7 (data), SP/LR/PC (control), and APSR (condition flags).
-
Memory access relies on Load/Store: Following RISC principles, only
<span>LDR</span>/<span>STR</span>instructions interact with memory. Function context protection uses the recommended<span>PUSH</span>/<span>POP</span>instructions. -
Program flow is controlled by PC: Use
<span>B</span>for jumps,<span>BL</span>for function calls (dependent on LR), and return is recommended using<span>BX LR</span>or<span>POP {PC}</span>.
Mastering the above content will lay a solid foundation for understanding and addressing scenarios in embedded development such as boot code/hardware initialization, writing interrupt service routines (ISRs), and optimizing performance-critical code segments.