Course 5: Phase 2 – Core Instructions and Programming
Topic: Memory Operations and Complex Data Structures, Bare-Metal Programming Practice
5.1 Advanced Memory Operations
-
Multi-Byte Memory Operations
-
<span>LDP</span>
/<span>STP</span>
: Load/Store two registers at once (supports 64-bit and 32-bit modes).
stp x0, x1, [sp, #-16]! // Push x0 and x1 onto the stack, SP -= 16 ldp x2, x3, [sp], #16 // Pop x2 and x3 from the stack, SP += 16
-
<span>LDUR</span>
/<span>STUR</span>
: Unaligned address access (requires hardware support).
-
Memory Barrier Instructions
-
<span>DMB</span>
: Data Memory Barrier (ensures memory operation order). -
<span>DSB</span>
: Data Synchronization Barrier (waits for all memory operations to complete).
Example:
1. Data Memory Barrier
; Example: Insert DMB after store operation to ensure storage completionSTR X0, [X1] ; Store data to [X1]DMB SY ; Ensure store operation is complete before visible to all CPUsLDR X2, [X3] ; Load [X3] into X2
2. Data Synchronization Barrier
; Example: Use DSB after modifying page tableSTR X4, [X5] ; Modify page table entryDSB SY ; Ensure page table modification is completeTLBI VAE1IS, X6 ; Invalidate TLB entry
-
Memory Barrier Scope
Memory barrier instructions can specify scope as follows:
SY (System-wide): All CPUs and devicesISH (Inner Shareable): Within the current CPU clusterNSH (Non-Shareable): Only the current CPUOSH (Outer Shareable): To CPUs and devicesExample code: DMB ISH ; Inner Shareable memory barrierDSB NSH ; Non-Shareable data synchronization barrier
5.2 Implementation of Complex Data Structures
Array Operations:
Define and traverse an array, calculate the sum of elements:
.section .data array: .quad 10, 20, 30, 40, 50 // Define a 64-bit integer array array_end: .section .text sum_array: ldr x0, =array // Starting address of the array ldr x1, =array_end // Ending address of the array sub x1, x1, x0 lsr x1, x1, #3 // Calculate the number of elements (each element is 8 bytes) mov x2, #0 // Accumulator loop: ldr x3, [x0], #8 // Read element and increment address add x2, x2, x3 subs x1, x1, #1 b.ne loop ret
Struct Operations:
Define a struct and access its members:
// Define struct (assuming 4-byte alignment) struct_point: .word 0 // x coordinate .word 0 // y coordinate .asciz "point" // Name // Access struct members ldr x0, =struct_point ldr w1, [x0, #0] // Read x coordinate ldr w2, [x0, #4] // Read y coordinate add w1, w1, #5 // x coordinate +=5 str w1, [x0, #0]
5.3 Practical: Dynamic Memory Allocator
Objective: Implement a simple heap memory management, supporting<span><span>malloc</span></span>
and <span><span>free</span></span>
. Code (<span><span>malloc.s</span></span>
):
.equ HEAP_START, 0x100000 // Heap start address (must reserve space in the linker script) .equ HEAP_SIZE, 1024 // Heap size (1KB) .section .text //--- Memory block header structure (16 bytes) --- // +----------------+ // | size (8 bytes) | // | next (8 bytes) | // +----------------+ // malloc function: allocate size bytes of memory, return address or 0 .global malloc malloc: stp x29, x30, [sp, #-16]! mov x29, sp mov x1, HEAP_START // Heap start address 1: ldr x2, [x1] // Read block size cbz x2, init_heap // If heap not initialized, jump to initialization tbz x2, #63, check_free // Check if block is free (highest bit is 0 indicates free) ldr x1, [x1, #8] // Point to next block b 1b check_free: and x3, x2, #0x7FFFFFFFFFFFFFFF // Clear the marker bit cmp x3, x0 // Compare block size with requested size b.ge split_block // If block is large enough, split ldr x1, [x1, #8] b 1b init_heap: // Initialize heap: create a single free block mov x2, HEAP_SIZE str x2, [x1] // Block size (not marked as used) mov x3, #0 str x3, [x1, #8] // Next block address is 0 b check_free split_block: // Split block logic (omitted, need to handle remaining space) // Mark current block as used orr x2, x3, #0x8000000000000000 str x2, [x1] add x0, x1, #16 // Return user available address (skip header) ldp x29, x30, [sp], #16 ret // free function: free memory (need to handle merging free blocks) .global free free: // Implementation omitted (recommended as a hands-on experiment)
5.4 Compilation and Debugging
Linker Script (<span>link.ld</span>
):
SECTIONS { . = 0x80000; // Program load address .text : { *(.text) } .data : { *(.data) } .bss : { . = ALIGN(16); heap_start = .; . += 1024; // Reserve 1KB space for heap heap_end = .; }}
Compilation Command:
aarch64-linux-gnu-as malloc.s -o malloc.o aarch64-linux-gnu-ld -nostdlib -T link.ld -o malloc.elf malloc.o
5.5 Hands-On Experiment
-
Implement
<span>free</span>
function: Improve the above code to support freeing memory and merging adjacent free blocks. -
Test Memory Allocation: Write a program to call
<span>malloc</span>
to allocate different sizes of memory and print addresses via UART. -
Data Structure Practice: Define an array of structs (e.g., coordinate points), traverse and calculate the center point coordinates.