System Practice Learning ARMv8 Assembly - Course 2

Course 2: Stage 1 – Basic Preparation (Week 2)

Topic: Detailed Explanation of ARMv8 Registers and Instruction Set, Bare-Metal Programming Practice

2.1 In-Depth Analysis of Registers

Classification of ARMv8 Registers:

General Purpose Registers (31):

X0: Function Argument 1 / Return Value.
X1-X7: Function Arguments 2-8.
X8: System Call Number.
X29: Frame Pointer (FP).
X30: Link Register (LR, saves function return address).

64-bit Names:X0-X30 (Full 64-bit operations).
32-bit Names:W0-W30 (Only operate on the lower 32 bits, upper 32 bits are cleared or preserved).
Special Purpose (Not mandatory, but should follow conventions):

Special Registers:

N (Negative): Set to 1 when the operation result is negative.
Z (Zero): Set to 1 when the operation result is 0.
C (Carry): Set to 1 during addition carry or subtraction borrow.
V (Overflow): Set to 1 during signed overflow.

SP (Stack Pointer): Points to the top of the current stack.
PC (Program Counter): Points to the next instruction to be executed (cannot be modified directly, must be controlled by jump instructions).
NZCV (Status Register):

Register Operation Example:

// Register assignment and operation
mov x0, #42        // x0 = 42
add x1, x0, x0     // x1 = x0 + x0 = 84
sub w2, w1, #10    // w2 = w1 - 10 (32-bit operation, result is 74)

2.2 Detailed Explanation of Basic Instruction Set

Instruction Format:

Basic Structure:Opcode Target Register, Source Operand 1, Source Operand 2

For example:ADD X0, X1, X2 → X0 = X1 + X2

Core Instruction Classification:

Data Processing Instructions:

MOV: Register/Immediate Assignment.

mov x3, #0x1000      // x3 = 0x1000
mov x4, x3           // x4 = x3

ADD/SUB: Addition and Subtraction.

add x5, x4, #8       // x5 = x4 + 8
sub x6, x5, x3       // x6 = x5 - x3

AND/ORR/EOR: Logical Operations (AND, OR, XOR).

and x7, x5, #0xFF    // x7 = x5 & 0xFF
orr x8, x7, #0x1     // x8 = x7 | 0x1

Memory Operation Instructions:

LDR (Load Data):

ldr x9, [x0]         // Load 8 bytes from memory address x0 to x9
ldr w10, [x1, #4]    // Load 4 bytes from memory address x1+4 to w10

STR (Store Data):

str x2, [x3]         // Write the value of x2 to memory address x3
str w11, [x4, #8]!   // Write w11 to x4+8 and update x4 to x4+8 (pre-indexed)

Control Flow Instructions:

B (Unconditional Jump):

b loop_start        // Jump to label loop_start

BL (Branch with Link, used for function calls):

bl my_function      // Call my_function, return address saved to LR (X30)

RET (Function Return):

ret                 // Return from function (equivalent to mov pc, lr)

2.3 Addressing Modes

Common Addressing Methods:

Immediate Addressing: Directly using constant values.
```
add x0, x1, #0x20    // x0 = x1 + 32
```
Register Indirect Addressing: Accessing data through memory addresses stored in registers.
```
ldr x2, [x3]         // Load data from the address pointed to by x3 into x2
```

Base + Offset Addressing:

str x4, [x5, #16]    // Store the value of x4 to the address of x5+16

Pre/Post Indexed Addressing:

ldr x6, [x7], #8     // Load data from the address of x7 into x6, then x7 +=8 (post-indexed)
str x8, [x9, #-4]!   // Store x8 to the address of x9-4 and update x9 = x9-4 (pre-indexed)

2.4 Bare-Metal Programming Practice

Objective: Write a program to calculate `10 + 20` and output the result via UART (characters `2` and `0`).

Code Example (`add_uart.s`):

.equ UART0_BASE, 0x9000000
.equ UARTFR, 0x18
.equ UARTFR_TXFF, (1 << 5)
.equ UARTDR, 0x0
.section .text
.global _start
_start:    // Calculate 10 + 20
    mov x0, #10
    mov x1, #20
    add x2, x0, x1      // x2 = 30 (result)
    // Convert result to ASCII character ('0' ASCII code is 0x30)
    add x3, x2, #0      // x3 = 30 (assuming result is less than 100)
    mov x4, #10
    udiv x5, x3, x4     // x5 = 30 / 10 = 3 (tens digit)
    mul x6, x5, x4      // x6 = 3 * 10 = 30
    sub x7, x3, x6      // x7 = 30 - 30 = 0 (units digit)
    add x5, x5, #0x30   // Tens digit to ASCII ('3')
    add x7, x7, #0x30   // Units digit to ASCII ('0')
    // Send tens digit
    mov x2, x5
    bl uart_putc
    // Send units digit
    mov x2, x7
    bl uart_putc
    // Send newline
    mov x2, #'
'
    bl uart_putc
halt:    b halt
// UART send function (same as Course 1)
uart_putc:    ldr x3, =UART0_BASE
tx_wait:    ldr w4, [x3, UARTFR]
    tst w4, UARTFR_TXFF
    b.ne tx_wait
    str w2, [x3, UARTDR]
    ret
.section .data
.align 12
stack_bottom:    .space 1024
stack_top:

Compilation and Execution:

aarch64-linux-gnu-as add_uart.s -o add_uart.o
aarch64-linux-gnu-ld -nostdlib -o add_uart.elf add_uart.o -Ttext=0x80000
qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -kernel add_uart.elf

Expected Output:

2.5 Hands-On Experiment

Modify Calculation Logic: Try calculating 15 + 25, observe if the output is 40.
Extend Functionality: Support three-digit output (e.g., calculate 150 + 50, output 200).

Debugging Exercise: Step through execution in QEMU using GDB, observe register changes:

qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -kernel add_uart.elf -S -s

Start GDB in another terminal:

gdb-multiarch -ex "target remote localhost:1234" -ex "file add_uart.elf"

System Practice Learning ARMv8 Assembly – Course 2

Course 2: Stage 1 – Basic Preparation (Week 2)

2.1 In-Depth Analysis of Registers

Classification of ARMv8 Registers:

Register Operation Example:

2.2 Detailed Explanation of Basic Instruction Set

Instruction Format:

Core Instruction Classification:

2.3 Addressing Modes

Common Addressing Methods:

2.4 Bare-Metal Programming Practice

Objective: Write a program to calculate `<span>10 + 20</span>` and output the result via UART (characters `<span>2</span>` and `<span>0</span>`).

Code Example (`<span>add_uart.s</span>`):

Compilation and Execution:

Expected Output:

2.5 Hands-On Experiment

Leave a Comment Cancel reply

Course 2: Stage 1 – Basic Preparation (Week 2)

2.1 In-Depth Analysis of Registers

Classification of ARMv8 Registers:

Register Operation Example:

2.2 Detailed Explanation of Basic Instruction Set

Instruction Format:

Core Instruction Classification:

2.3 Addressing Modes

Common Addressing Methods:

2.4 Bare-Metal Programming Practice

Objective: Write a program to calculate <span>10 + 20</span> and output the result via UART (characters <span>2</span> and <span>0</span>).

Code Example (<span>add_uart.s</span>):

Compilation and Execution:

Expected Output:

2.5 Hands-On Experiment

Related posts

Leave a Comment Cancel reply

Objective: Write a program to calculate `<span>10 + 20</span>` and output the result via UART (characters `<span>2</span>` and `<span>0</span>`).

Code Example (`<span>add_uart.s</span>`):