System Practice Learning ARMv8 Assembly – Course 5

Course 5: Phase 2 – Core Instructions and Programming

Topic: Memory Operations and Complex Data Structures, Bare-Metal Programming Practice

5.1 Advanced Memory Operations

  • Multi-Byte Memory Operations

  • <span>LDP</span>/<span>STP</span>: Load/Store two registers at once (supports 64-bit and 32-bit modes).

stp x0, x1, [sp, #-16]!  // Push x0 and x1 onto the stack, SP -= 16  ldp x2, x3, [sp], #16    // Pop x2 and x3 from the stack, SP += 16
  • <span>LDUR</span>/<span>STUR</span>: Unaligned address access (requires hardware support).

  • Memory Barrier Instructions

  • <span>DMB</span>: Data Memory Barrier (ensures memory operation order).

  • <span>DSB</span>: Data Synchronization Barrier (waits for all memory operations to complete).

Example:

1. Data Memory Barrier

; Example: Insert DMB after store operation to ensure storage completionSTR X0, [X1]      ; Store data to [X1]DMB SY            ; Ensure store operation is complete before visible to all CPUsLDR X2, [X3]      ; Load [X3] into X2

2. Data Synchronization Barrier

; Example: Use DSB after modifying page tableSTR X4, [X5]      ; Modify page table entryDSB SY            ; Ensure page table modification is completeTLBI VAE1IS, X6   ; Invalidate TLB entry
  • Memory Barrier Scope

Memory barrier instructions can specify scope as follows:

SY (System-wide): All CPUs and devicesISH (Inner Shareable): Within the current CPU clusterNSH (Non-Shareable): Only the current CPUOSH (Outer Shareable): To CPUs and devicesExample code: DMB ISH  ; Inner Shareable memory barrierDSB NSH  ; Non-Shareable data synchronization barrier

5.2 Implementation of Complex Data Structures

Array Operations:

Define and traverse an array, calculate the sum of elements:

.section .data  array:      .quad 10, 20, 30, 40, 50  // Define a 64-bit integer array  array_end:  .section .text  sum_array:      ldr x0, =array            // Starting address of the array      ldr x1, =array_end        // Ending address of the array      sub x1, x1, x0      lsr x1, x1, #3            // Calculate the number of elements (each element is 8 bytes)      mov x2, #0                // Accumulator  loop:      ldr x3, [x0], #8          // Read element and increment address      add x2, x2, x3      subs x1, x1, #1      b.ne loop      ret

Struct Operations:

Define a struct and access its members:

// Define struct (assuming 4-byte alignment)  struct_point:      .word 0  // x coordinate      .word 0  // y coordinate      .asciz "point"  // Name  // Access struct members  ldr x0, =struct_point  ldr w1, [x0, #0]      // Read x coordinate  ldr w2, [x0, #4]      // Read y coordinate  add w1, w1, #5        // x coordinate +=5  str w1, [x0, #0]

5.3 Practical: Dynamic Memory Allocator

Objective: Implement a simple heap memory management, supporting<span><span>malloc</span></span> and <span><span>free</span></span>. Code (<span><span>malloc.s</span></span>):

.equ HEAP_START, 0x100000  // Heap start address (must reserve space in the linker script)  .equ HEAP_SIZE,  1024      // Heap size (1KB)  .section .text  //--- Memory block header structure (16 bytes) ---  // +----------------+  // | size (8 bytes) |  // | next (8 bytes) |  // +----------------+  // malloc function: allocate size bytes of memory, return address or 0  .global malloc  malloc:      stp x29, x30, [sp, #-16]!      mov x29, sp      mov x1, HEAP_START      // Heap start address  1:      ldr x2, [x1]           // Read block size      cbz x2, init_heap      // If heap not initialized, jump to initialization      tbz x2, #63, check_free  // Check if block is free (highest bit is 0 indicates free)      ldr x1, [x1, #8]       // Point to next block      b 1b  check_free:      and x3, x2, #0x7FFFFFFFFFFFFFFF  // Clear the marker bit      cmp x3, x0            // Compare block size with requested size      b.ge split_block      // If block is large enough, split      ldr x1, [x1, #8]      b 1b  init_heap:      // Initialize heap: create a single free block      mov x2, HEAP_SIZE      str x2, [x1]          // Block size (not marked as used)      mov x3, #0      str x3, [x1, #8]      // Next block address is 0      b check_free  split_block:      // Split block logic (omitted, need to handle remaining space)      // Mark current block as used      orr x2, x3, #0x8000000000000000      str x2, [x1]      add x0, x1, #16       // Return user available address (skip header)      ldp x29, x30, [sp], #16      ret  // free function: free memory (need to handle merging free blocks)  .global free  free:      // Implementation omitted (recommended as a hands-on experiment)

5.4 Compilation and Debugging

Linker Script (<span>link.ld</span>):

SECTIONS {      . = 0x80000;          // Program load address      .text : { *(.text) }      .data : { *(.data) }      .bss : {          . = ALIGN(16);          heap_start = .;          . += 1024;        // Reserve 1KB space for heap          heap_end = .;      }}

Compilation Command:

aarch64-linux-gnu-as malloc.s -o malloc.o  aarch64-linux-gnu-ld -nostdlib -T link.ld -o malloc.elf malloc.o

5.5 Hands-On Experiment

  1. Implement<span>free</span> function: Improve the above code to support freeing memory and merging adjacent free blocks.

  2. Test Memory Allocation: Write a program to call<span>malloc</span> to allocate different sizes of memory and print addresses via UART.

  3. Data Structure Practice: Define an array of structs (e.g., coordinate points), traverse and calculate the center point coordinates.

System Practice Learning ARMv8 Assembly - Course 5

Leave a Comment