System Practice Learning ARMv8 Assembly – Course 2

Course 2: Stage 1 – Basic Preparation (Week 2)

Topic: Detailed Explanation of ARMv8 Registers and Instruction Set, Bare-Metal Programming Practice

2.1 In-Depth Analysis of Registers

Classification of ARMv8 Registers:

  1. General Purpose Registers (31):

  • <span>X0</span>: Function Argument 1 / Return Value.

  • <span>X1-X7</span>: Function Arguments 2-8.

  • <span>X8</span>: System Call Number.

  • <span>X29</span>: Frame Pointer (FP).

  • <span>X30</span>: Link Register (LR, saves function return address).

  • 64-bit Names:<span>X0-X30</span> (Full 64-bit operations).

  • 32-bit Names:<span>W0-W30</span> (Only operate on the lower 32 bits, upper 32 bits are cleared or preserved).

  • Special Purpose (Not mandatory, but should follow conventions):

  • Special Registers:

    • N (Negative): Set to 1 when the operation result is negative.

    • Z (Zero): Set to 1 when the operation result is 0.

    • C (Carry): Set to 1 during addition carry or subtraction borrow.

    • V (Overflow): Set to 1 during signed overflow.

    • SP (Stack Pointer): Points to the top of the current stack.

    • PC (Program Counter): Points to the next instruction to be executed (cannot be modified directly, must be controlled by jump instructions).

    • NZCV (Status Register):

    Register Operation Example:

    // Register assignment and operation
    mov x0, #42        // x0 = 42
    add x1, x0, x0     // x1 = x0 + x0 = 84
    sub w2, w1, #10    // w2 = w1 - 10 (32-bit operation, result is 74)

    2.2 Detailed Explanation of Basic Instruction Set

    Instruction Format:

    • Basic Structure:<span>Opcode Target Register, Source Operand 1, Source Operand 2</span>

      • For example:<span>ADD X0, X1, X2</span><span>X0 = X1 + X2</span>

    Core Instruction Classification:

    1. Data Processing Instructions:

    • <span>MOV</span>: Register/Immediate Assignment.

      mov x3, #0x1000      // x3 = 0x1000
      mov x4, x3           // x4 = x3
    • <span>ADD/SUB</span>: Addition and Subtraction.

      add x5, x4, #8       // x5 = x4 + 8
      sub x6, x5, x3       // x6 = x5 - x3
    • <span>AND/ORR/EOR</span>: Logical Operations (AND, OR, XOR).

      and x7, x5, #0xFF    // x7 = x5 & 0xFF
      orr x8, x7, #0x1     // x8 = x7 | 0x1
  • Memory Operation Instructions:

    • <span>LDR</span> (Load Data):

      ldr x9, [x0]         // Load 8 bytes from memory address x0 to x9
      ldr w10, [x1, #4]    // Load 4 bytes from memory address x1+4 to w10
    • <span>STR</span> (Store Data):

      str x2, [x3]         // Write the value of x2 to memory address x3
      str w11, [x4, #8]!   // Write w11 to x4+8 and update x4 to x4+8 (pre-indexed)
  • Control Flow Instructions:

    • <span>B</span> (Unconditional Jump):

      b loop_start        // Jump to label loop_start
    • <span>BL</span> (Branch with Link, used for function calls):

      bl my_function      // Call my_function, return address saved to LR (X30)
    • <span>RET</span> (Function Return):

      ret                 // Return from function (equivalent to mov pc, lr)

    2.3 Addressing Modes

    Common Addressing Methods:

    1. Immediate Addressing: Directly using constant values.

      add x0, x1, #0x20    // x0 = x1 + 32
    2. Register Indirect Addressing: Accessing data through memory addresses stored in registers.

      ldr x2, [x3]         // Load data from the address pointed to by x3 into x2
    3. Base + Offset Addressing:

      str x4, [x5, #16]    // Store the value of x4 to the address of x5+16
    4. Pre/Post Indexed Addressing:

      ldr x6, [x7], #8     // Load data from the address of x7 into x6, then x7 +=8 (post-indexed)
      str x8, [x9, #-4]!   // Store x8 to the address of x9-4 and update x9 = x9-4 (pre-indexed)

    2.4 Bare-Metal Programming Practice

    Objective: Write a program to calculate <span>10 + 20</span> and output the result via UART (characters <span>2</span> and <span>0</span>).

    Code Example (<span>add_uart.s</span>):

    .equ UART0_BASE, 0x9000000
    .equ UARTFR, 0x18
    .equ UARTFR_TXFF, (1 << 5)
    .equ UARTDR, 0x0
    .section .text
    .global _start
    _start:    // Calculate 10 + 20
        mov x0, #10
        mov x1, #20
        add x2, x0, x1      // x2 = 30 (result)
        // Convert result to ASCII character ('0' ASCII code is 0x30)
        add x3, x2, #0      // x3 = 30 (assuming result is less than 100)
        mov x4, #10
        udiv x5, x3, x4     // x5 = 30 / 10 = 3 (tens digit)
        mul x6, x5, x4      // x6 = 3 * 10 = 30
        sub x7, x3, x6      // x7 = 30 - 30 = 0 (units digit)
        add x5, x5, #0x30   // Tens digit to ASCII ('3')
        add x7, x7, #0x30   // Units digit to ASCII ('0')
        // Send tens digit
        mov x2, x5
        bl uart_putc
        // Send units digit
        mov x2, x7
        bl uart_putc
        // Send newline
        mov x2, #'
    '
        bl uart_putc
    halt:    b halt
    // UART send function (same as Course 1)
    uart_putc:    ldr x3, =UART0_BASE
    tx_wait:    ldr w4, [x3, UARTFR]
        tst w4, UARTFR_TXFF
        b.ne tx_wait
        str w2, [x3, UARTDR]
        ret
    .section .data
    .align 12
    stack_bottom:    .space 1024
    stack_top:

    Compilation and Execution:

    aarch64-linux-gnu-as add_uart.s -o add_uart.o
    aarch64-linux-gnu-ld -nostdlib -o add_uart.elf add_uart.o -Ttext=0x80000
    qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -kernel add_uart.elf

    Expected Output:

    30

    2.5 Hands-On Experiment

    1. Modify Calculation Logic: Try calculating <span>15 + 25</span>, observe if the output is <span>40</span>.

    2. Extend Functionality: Support three-digit output (e.g., calculate <span>150 + 50</span>, output <span>200</span>).

    3. Debugging Exercise: Step through execution in QEMU using GDB, observe register changes:

      qemu-system-aarch64 -M virt -cpu cortex-a53 -nographic -kernel add_uart.elf -S -s

      Start GDB in another terminal:

      gdb-multiarch -ex "target remote localhost:1234" -ex "file add_uart.elf"

    System Practice Learning ARMv8 Assembly - Course 2

    Leave a Comment