Course 2: Stage 1 – Basic Preparation (Week 2)
Topic: Detailed Explanation of ARMv8 Registers and Instruction Set, Bare-Metal Programming Practice
2.1 In-Depth Analysis of Registers
Classification of ARMv8 Registers:
-
General Purpose Registers (31):
-
<span>X0</span>
: Function Argument 1 / Return Value. -
<span>X1-X7</span>
: Function Arguments 2-8. -
<span>X8</span>
: System Call Number. -
<span>X29</span>
: Frame Pointer (FP). -
<span>X30</span>
: Link Register (LR, saves function return address).
-
64-bit Names:
<span>X0-X30</span>
(Full 64-bit operations). -
32-bit Names:
<span>W0-W30</span>
(Only operate on the lower 32 bits, upper 32 bits are cleared or preserved). -
Special Purpose (Not mandatory, but should follow conventions):
Special Registers:
-
N (Negative): Set to 1 when the operation result is negative.
-
Z (Zero): Set to 1 when the operation result is 0.
-
C (Carry): Set to 1 during addition carry or subtraction borrow.
-
V (Overflow): Set to 1 during signed overflow.
-
SP (Stack Pointer): Points to the top of the current stack.
-
PC (Program Counter): Points to the next instruction to be executed (cannot be modified directly, must be controlled by jump instructions).
-
NZCV (Status Register):
Register Operation Example:
// Register assignment and operation
mov x0, #42 // x0 = 42
add x1, x0, x0 // x1 = x0 + x0 = 84
sub w2, w1, #10 // w2 = w1 - 10 (32-bit operation, result is 74)
2.2 Detailed Explanation of Basic Instruction Set
Instruction Format:
-
Basic Structure:
<span>Opcode Target Register, Source Operand 1, Source Operand 2</span>
-
For example:
<span>ADD X0, X1, X2</span>
→<span>X0 = X1 + X2</span>
Core Instruction Classification:
-
Data Processing Instructions:
-
<span>MOV</span>
: Register/Immediate Assignment.mov x3, #0x1000 // x3 = 0x1000 mov x4, x3 // x4 = x3
-
<span>ADD/SUB</span>
: Addition and Subtraction.add x5, x4, #8 // x5 = x4 + 8 sub x6, x5, x3 // x6 = x5 - x3
-
<span>AND/ORR/EOR</span>
: Logical Operations (AND, OR, XOR).and x7, x5, #0xFF // x7 = x5 & 0xFF orr x8, x7, #0x1 // x8 = x7 | 0x1
Memory Operation Instructions:
-
<span>LDR</span>
(Load Data):ldr x9, [x0] // Load 8 bytes from memory address x0 to x9 ldr w10, [x1, #4] // Load 4 bytes from memory address x1+4 to w10
-
<span>STR</span>
(Store Data):str x2, [x3] // Write the value of x2 to memory address x3 str w11, [x4, #8]! // Write w11 to x4+8 and update x4 to x4+8 (pre-indexed)
Control Flow Instructions:
-
<span>B</span>
(Unconditional Jump):b loop_start // Jump to label loop_start
-
<span>BL</span>
(Branch with Link, used for function calls):bl my_function // Call my_function, return address saved to LR (X30)
-
<span>RET</span>
(Function Return):ret // Return from function (equivalent to mov pc, lr)
2.3 Addressing Modes
Common Addressing Methods:
-
Immediate Addressing: Directly using constant values.
add x0, x1, #0x20 // x0 = x1 + 32
-
Register Indirect Addressing: Accessing data through memory addresses stored in registers.
ldr x2, [x3] // Load data from the address pointed to by x3 into x2
-
Base + Offset Addressing:
str x4, [x5, #16] // Store the value of x4 to the address of x5+16
-
Pre/Post Indexed Addressing:
ldr x6, [x7], #8 // Load data from the address of x7 into x6, then x7 +=8 (post-indexed) str x8, [x9, #-4]! // Store x8 to the address of x9-4 and update x9 = x9-4 (pre-indexed)
2.4 Bare-Metal Programming Practice
Objective: Write a program to calculate <span>10 + 20</span>
and output the result via UART (characters <span>2</span>
and <span>0</span>
).
Code Example (<span>add_uart.s</span>
):
.equ UART0_BASE, 0x9000000
.equ UARTFR, 0x18
.equ UARTFR_TXFF, (1 << 5)
.equ UARTDR, 0x0
.section .text
.global _start
_start: // Calculate 10 + 20
mov x0, #10
mov x1, #20
add x2, x0, x1 // x2 = 30 (result)
// Convert result to ASCII character ('0' ASCII code is 0x30)
add x3, x2, #0 // x3 = 30 (assuming result is less than 100)
mov x4, #10
udiv x5, x3, x4 // x5 = 30 / 10 = 3 (tens digit)
mul x6, x5, x4 // x6 = 3 * 10 = 30
sub x7, x3, x6 // x7 = 30 - 30 = 0 (units digit)
add x5, x5, #0x30 // Tens digit to ASCII ('3')
add x7, x7, #0x30 // Units digit to ASCII ('0')
// Send tens digit
mov x2, x5
bl uart_putc
// Send units digit
mov x2, x7
bl uart_putc
// Send newline
mov x2, #'
'
bl uart_putc
halt: b halt
// UART send function (same as Course 1)
uart_putc: ldr x3, =UART0_BASE
tx_wait: ldr w4, [x3, UARTFR]
tst w4, UARTFR_TXFF
b.ne tx_wait
str w2, [x3, UARTDR]
ret
.section .data
.align 12
stack_bottom: .space 1024
stack_top:
Compilation and Execution:
aarch64-linux-gnu-as add_uart.s -o add_uart.o
aarch64-linux-gnu-ld -nostdlib -o add_uart.elf add_uart.o -Ttext=0x80000
qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -kernel add_uart.elf
Expected Output:
30
2.5 Hands-On Experiment
-
Modify Calculation Logic: Try calculating
<span>15 + 25</span>
, observe if the output is<span>40</span>
. -
Extend Functionality: Support three-digit output (e.g., calculate
<span>150 + 50</span>
, output<span>200</span>
). -
Debugging Exercise: Step through execution in QEMU using GDB, observe register changes:
qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -kernel add_uart.elf -S -s
Start GDB in another terminal:
gdb-multiarch -ex "target remote localhost:1234" -ex "file add_uart.elf"