This article provides a line-by-line explanation of the Armv8-A assembly code sourced from Arm® Development Studio. It focuses on the boot sequence of the AArch64 MPCore processor. This code initializes the system from Exception Level 3 (EL3, typically the Secure Monitor mode) and transitions to Exception Level 1 (EL1, typically the OS mode), during which it sets up the vector table, MMU, cache, and GICv3 interrupt controller. The startup.S file is broken down into several logical parts and explained.
System Initial Setup and Vector Table Configuration
.global start64.type start64, "function"start64: // // program the VBARs // ldr x1, =el1_vectors msr VBAR_EL1, x1 ldr x1, =el2_vectors msr VBAR_EL2, x1 ldr x1, =el3_vectors msr VBAR_EL3, x1
-
start64: Defines the entry point for the 64-bit startup code.
-
ldr x1, =el1_vectors: Loads the address of the EL1 vector table into register x1.
-
msr VBAR_EL1, x1: Sets the vector base address register for EL1 (VBAR_EL1) to point to el1_vectors. This defines the location of the exception vectors for EL1 (e.g., interrupts, faults).
-
Similar actions are taken to set VBAR_EL2 (for Hypervisor mode) and VBAR_EL3 (for Secure Monitor mode) to their respective vector tables.
GICv3 Mode Selection and Security Configuration
msr SCR_EL3, xzr // Ensure NS bit is initially clear, so secure copy of ICC_SRE_EL1 can be configured isb mov x0, #15 msr ICC_SRE_EL3, x0 isb msr ICC_SRE_EL1, x0 // Secure copy of ICC_SRE_EL1
-
msr SCR_EL3, xzr: Clears the Secure Configuration Register (SCR_EL3) to ensure the Non-Secure (NS) bit is 0, meaning we are initially in a secure state.
-
isb: Instruction Synchronization Barrier ensures that previous instructions are completed before proceeding.
-
mov x0, #15: Sets x0 to 15 (binary 0b1111), enabling system register access and GICv3 mode.
-
msr ICC_SRE_EL3, x0: Enables the system register interface for GICv3 at EL3 (ICC_SRE_EL3 = 0b1111 enables SRE, EL2, and IRQ/FIQ bypass).
-
msr ICC_SRE_EL1, x0: Similarly configures the Secure copy of ICC_SRE_EL1.
mov x3, #(SCR_EL3_RW |
SCR_EL3_SMD |
SCR_EL3_NS) msr SCR_EL3, x3 isb mov x0, #15 msr ICC_SRE_EL2, x0 isb msr ICC_SRE_EL1, x0 // Non-secure copy of ICC_SRE_EL1
-
mov x3, #(SCR_EL3_RW | SCR_EL3_SMD | SCR_EL3_NS): Configures SCR_EL3:
-
SCR_EL3_RW: Lower levels (EL1/EL0) are AArch64.
-
SCR_EL3_SMD: Disables Secure Monitor Call (SMC) traps.
-
SCR_EL3_NS: Sets Non-Secure mode, allowing access to Non-Secure registers.
-
msr SCR_EL3, x3: Applies this configuration.
-
msr ICC_SRE_EL2, x0: Enables GICv3 system register access at EL2.
-
msr ICC_SRE_EL1, x0: Updates the Non-Secure copy of ICC_SRE_EL1.
Hypervisor and Virtualization Setup
mov x2, #HCR_EL2_RW msr HCR_EL2, x2 msr VTTBR_EL2, xzr mrs x0, MPIDR_EL1 msr VMPIDR_EL2, x0 mrs x0, MIDR_EL1 msr VPIDR_EL2, x0
-
mov x2, #HCR_EL2_RW: Sets HCR_EL2.RW to 1, indicating EL1 is AArch64 (no virtualization traps).
-
msr HCR_EL2, x2: Writes to the Hypervisor Configuration Register.
-
msr VTTBR_EL2, xzr: Clears the Virtualization Translation Table Base Register for EL2 (no second stage translation).
-
mrs x0, MPIDR_EL1: Reads the Multiprocessor Affinity Register (MPIDR_EL1) into x0.
-
msr VMPIDR_EL2, x0: Sets the Virtualization MPIDR, ensuring Non-Secure EL1 reads return the correct CPU ID.
-
mrs x0, MIDR_EL1: Reads the Main ID Register (MIDR_EL1).
-
msr VPIDR_EL2, x0: Sets the Virtualization Processor ID for consistency.
bl GetCPUID mov x19, x0
-
bl GetCPUID: Branches to the function that extracts the CPU ID from MPIDR_EL1.
-
mov x19, x0: Stores the CPU ID in x19 (a caller-saved register) for later use.
Floating Point and System Control Settings
msr CPTR_EL3, xzr msr CPTR_EL2, xzr
-
msr CPTR_EL3, xzr: Clears the CP trap register for EL3, disabling traps for floating point and SIMD access.
-
msr CPTR_EL2, xzr: Same for EL2.
#ifdef __ARM_BIG_ENDIAN mov x0, #(SCTLR_ELx_EE | SCTLR_EL1_E0E)#else mov x0, #0#endif msr SCTLR_EL3, x0 msr SCTLR_EL2, x0 msr SCTLR_EL1, x0
-
Conditionally sets the byte order in the System Control Registers (SCTLR_ELx):
-
If ARM_BIG_ENDIAN is defined, sets EE (byte order for EL3/EL2/EL1) and E0E (byte order for EL0) to big-endian.
-
Otherwise, clears SCTLR_ELx to 0 (little-endian by default).
-
msr SCTLR_ELx, x0: Applies to EL3, EL2, and EL1.
Cortex-A Specific Configuration
#ifdef CORTEXA mov x0, #((1 << 0) |
(1 << 1) |
(1 << 4) |
(1 << 5) |
(1 << 6)) msr ACTLR_EL3, x0 msr ACTLR_EL2, x0
-
For Cortex-A processors, configures the Auxiliary Control Register (ACTLR_EL3/ACTLR_EL2):
-
Enables lower-level access to CPUACTLR, CPUECTLR, L2CTLR, L2ECTLR, and L2ACTLR.
mrs x0, S3_1_c15_c2_1 // Read EL1 CPU Extended Control Register orr x0, x0, #(1 << 6) // Set the SMPEN bit msr S3_1_c15_c2_1, x0 // Write EL1 CPU Extended Control Register isb
-
mrs x0, S3_1_c15_c2_1: Reads the CPU Extended Control Register (CPUECTLR_EL1).
-
orr x0, x0, #(1 << 6): Sets the SMPEN bit (bit 6), enabling symmetric multiprocessing (cache/TLB coherence).
-
msr S3_1_c15_c2_1, x0: Writes back the modified value.
-
isb: Ensures changes take effect.
Stack Setup and GIC Configuration
ldr x0, =__el3_stack sub x0, x0, x19, lsl #12 mov sp, x0
-
ldr x0, =__el3_stack: Loads the base address of the EL3 stack.
-
sub x0, x0, x19, lsl #12: Allocates 4KB (2^12) for each CPU, offset by the CPU ID in x19.
-
mov sp, x0: Sets the stack pointer for EL3.
mov x0, #(1 << 4) | (1 << 5) // gicdctlr_ARE_S | gicdctlr_ARE_NS mov x1, x19 bl SyncAREinGICD
-
Enables Affinity Routing in the GIC Distributor (ARE_S for secure, ARE_NS for non-secure).
-
mov x1, x19: Passes the CPU ID as a parameter.
-
bl SyncAREinGICD: Calls the function to synchronize this setting.
bl GetAffinity bl GetGICR mov w20, w0 // Keep a copy for later bl WakeupGICR
-
bl GetAffinity: Gets the CPU affinity (possibly from MPIDR_EL1).
-
bl GetGICR: Retrieves the GIC Redistributor address into w0.
-
mov w20, w0: Keeps it in w20 (32-bit, as GICR addresses are typically 32-bit aligned).
-
bl WakeupGICR: Wakes up the Redistributor (marks the CPU as awake).
mov w0, w20 mov w1, #1 // gicigroupr_G1NS bl SetPrivateIntSecurityBlock
-
Configures the Private Peripheral Interrupt (PPI) as Group 1 Non-Secure (G1NS).
-
mov w0, w20: Passes the GICR address.
-
mov w1, #1: Specifies Group 1 Non-Secure.
-
bl SetPrivateIntSecurityBlock: Applies the settings.
mov x0, #0xFF // for Non-Secure interrupts msr ICC_PMR_EL1, x0
-
msr ICC_PMR_EL1, x0: Sets the interrupt priority mask to 0xFF (lowest priority), allowing handling of Non-Secure interrupts.
Main CPU GIC Setup and EL1 Drop
cbnz x19, drop_to_el1 mov w0, #1 // gicigroupr_G1NS bl SetSPISecurityAll
-
cbnz x19, drop_to_el1: If not CPU 0 (x19 != 0), jumps to drop_to_el1.
-
Only for CPU 0:
-
mov w0, #1: Sets the SPI (Shared Peripheral Interrupt) to Group 1 Non-Secure.
-
bl SetSPISecurityAll: Configures all SPIs.
.global drop_to_el1drop_to_el1: adr x1, el1_entry_aarch64 msr ELR_EL3, x1 mov x1, #(AARCH64_SPSR_EL1h |
AARCH64_SPSR_F |
AARCH64_SPSR_I |
AARCH64_SPSR_A) msr SPSR_EL3, x1 eret
-
adr x1, el1_entry_aarch64: Loads the address of the EL1 entry point.
-
msr ELR_EL3, x1: Sets the Exception Link Register for EL3 to return to the EL1 entry.
-
mov x1, #(…): Configures the saved Program Status Register (SPSR_EL3):
-
AARCH64_SPSR_EL1h: EL1 with SP_EL1 (stack pointer).
-
AARCH64_SPSR_F | I | A: Masks FIQ, IRQ, and Abort exceptions.
-
msr SPSR_EL3, x1: Applies the configuration.
-
eret: Executes exception return, dropping to EL1.
EL1 General Boot
.global el1_entry_aarch64.type el1_entry_aarch64, "function"el1_entry_aarch64: ldr x0, =__stack sub x0, x0, x19, lsl #14 mov sp, x0
-
Sets up the EL1 application stack: 16KB (2^14) per CPU, offset by CPU ID in x19.设置 EL1 应用程序堆栈:每个 CPU 16KB (2^14),偏移 x19 中的 CPU ID。
mov x0, #CPACR_EL1_FPEN msr CPACR_EL1, x0
-
Enables floating point and SIMD by setting CPACR_EL1. FPEN allows access from EL1/EL0.
bl InvalidateUDCaches tlbi VMALLE1
-
bl InvalidateUDCaches: Invalidates the unified data caches.
-
tlbi VMALLE1: Invalidates all stage 1 TLB entries for EL1.
ldr x1, =__ttb0_l1 msr TTBR0_EL1, x1
-
Sets TTBR0_EL1 to the base of the level 1 translation table.
mov x1, #0xff44 movk x1, #4, LSL #16 msr MAIR_EL1, x1
-
Configures the Memory Attribute Indirection Register (MAIR_EL1):
-
Index 0: 0x44 = Normal, non-cacheable.
-
Index 1: 0xFF = Normal, write-back, read/write allocate.
-
Index 2: 0x04 = Device-nGnRE.
ldr x1, =0x0000000000802520 msr TCR_EL1, x1 isb
-
Configures the Translation Control Register (TCR_EL1):
-
T0SZ = 32: 32-bit virtual address space for TTBR0.
-
TG0 = 00: 4KB granule.
-
SH0 = 10: Shareable externally.
-
ORGN0 = IRGN0 = 01: Write-back, write-allocate.
-
EPD1 = 1: TTBR1 is disabled.
-
Ensures the MMU sees these changes.
cbnz x19, el1_secondary
-
If not CPU 0, branches to secondary CPU code.
EL1 Primary CPU Initialization
.global el1_primary.type el1_primary, "function"el1_primary: mov w0, #(1 << 1) // gicdctlr_EnableGrp1A bl EnableGICD
-
Enables Group 1 Active interrupts in the GIC Distributor.在 GIC Distributor 中启用 Group 1 Active 中断。
ldr x21, =__ttb0_l1 mov x0, x21 mov x1, #(4 << 3) bl ZeroBlock
-
Clears the Level 1 translation table (4 entries, 8 bytes each).
RAM MappingRAM 映射
ldr x22, =__ttb0_l2_ram mov x1, #(512 << 3) mov x0, x22 bl ZeroBlock ldr x4, =__code_start ubfx x23, x4, #30, #2 orr x1, x22, #TT_S1_ATTR_PAGE str x1, [x21, x23, lsl #3]
-
Clears the Level 2 table for RAM (512 entries).
-
Maps the RAM region (1GB) to the L2 table in the L1 table.
ubfx x2, x4, #21, #9 ldr x5, =__top_of_ram sub x3, x5, #1 ubfx x3, x3, #21, #9 add x3, x3, #1 sub x3, x3, x2
-
Calculates the number of 2MB blocks to be mapped as RAM.
bic x4, x4, #((1 << 21) - 1) ldr x1, =(TT_S1_ATTR_BLOCK |
(1 << TT_S1_ATTR_MATTR_LSB) |
TT_S1_ATTR_NS |
TT_S1_ATTR_AP_RW_PL1 |
TT_S1_ATTR_SH_INNER |
TT_S1_ATTR_AF |
TT_S1_ATTR_nG) orr x1, x1, x4 add x0, x22, x2, lsl #3loop1: subs x3, x3, #1 str x1, [x0], #8 add x1, x1, #0x200, LSL #12 bne loop1
-
Maps RAM as Normal Write-back, Inner Shareable, EL1 RW, with flat VA->PA translation.
Peripheral Mapping
ldr x24, =__ttb0_l2_periph ldr x4, =gicd ubfx x25, x4, #30, #2 cmp x25, x23 csel x24, x22, x24, EQ b.eq nol2setup
-
Checks if the peripheral shares the same 1GB region as RAM; if so, reuses the RAM L2 table.
mov x0, x24 mov x1, #512 << 3 bl ZeroBlock orr x1, x24, #TT_S1_ATTR_PAGE str x1, [x21, x25, lsl #3]nol2setup: ubfx x2, x4, #21, #9 bic x4, x4, #((1 << 21) - 1) ldr x1, =(TT_S1_ATTR_BLOCK |
(2 << TT_S1_ATTR_MATTR_LSB) |
TT_S1_ATTR_NS |
TT_S1_ATTR_AP_RW_PL1 |
TT_S1_ATTR_AF |
TT_S1_ATTR_nG) orr x1, x1, x4 str x1, [x24, x2, lsl #3]
-
Maps the GIC Distributor (2MB) as Device-nGnRE.
ldr x4, =__cs3_peripherals ubfx x2, x4, #21, #9 bic x4, x4, #((1 << 21) - 1) ldr x1, =(TT_S1_ATTR_BLOCK |
(2 << TT_S1_ATTR_MATTR_LSB) |
TT_S1_ATTR_NS |
TT_S1_ATTR_AP_RW_PL1 |
TT_S1_ATTR_AF |
TT_S1_ATTR_nG) orr x1, x1, x4 str x1, [x24, x2, lsl #3]
-
Similarly maps CS3 peripherals (e.g., UART).
dsb ish mrs x1, SCTLR_EL1 orr x1, x1, #SCTLR_ELx_M bic x1, x1, #SCTLR_ELx_A msr SCTLR_EL1, x1 isb
-
Enables the MMU (SCTLR_ELx_M), disables alignment faults (SCTLR_ELx_A).
dsb ish ic ialluis dsb ish isb
-
Invalidates the instruction cache (even if disabled) to ensure consistency.
mrs x1, SCTLR_EL1 orr x1, x1, #SCTLR_ELx_C orr x1, x1, #SCTLR_ELx_I msr SCTLR_EL1, x1 isb
-
Enables data and instruction caches.
C Runtime Initialization
ldr x0, =__bss_start__ mov x1, #0 ldr x2, =__bss_end__ sub x2, x2, x0 bl memset
-
Zeros the BSS section.
bl initialise_monitor_handles ldr x0, =__libc_fini_array bl atexit bl __libc_init_array
-
Initializes standard I/O, sets up exit handlers, and runs pre-main initialization.
mov x0, #1 ldr x1, =argv bl main b exit
-
Calls main() with argc = 1 and a null-terminated argv.
EL1 Secondary (Non-Primary Core) CPU Initialization
.global el1_secondary.type el1_secondary, "function"el1_secondary: mov w0, w20 mov w1, #15 mov w2, #14 << 4 bl SetPrivateIntPriority
-
Sets the priority of SGI 15 to 14 (for non-secure 4-bit priority shift).
mov w0, w20 mov w1, #15 bl EnablePrivateInt mov x0, #31 << 3 msr ICC_PMR_EL1, x0 mov x0, #1 msr ICC_IGRPEN1_EL1, x0 isb
-
Enables SGI 15, sets low priority mask, and enables interrupts in Group 1.
loop_wfi: dsb SY wfi mov w0, w20 mov w1, #15 bl GetPrivateIntPending cbz w0, loop_wfi mov w0, w20 mov w1, #15 bl ClearPrivateIntPending
-
Waits for SGI 15, clears it upon receipt.
mrs x1, SCTLR_EL1 orr x1, x1, #SCTLR_ELx_M orr x1, x1, #SCTLR_ELx_C orr x1, x1, #SCTLR_ELx_I bic x1, x1, #SCTLR_ELx_A msr SCTLR_EL1, x1 isb B MainApp
-
Enables MMU and caches, then jumps to MainApp.
Armv8 Assembly Reference MaterialsArmv8 组装参考资料
-
Arm Architecture Reference Manual for Armv8-A:Armv8-A 的 Arm 架构参考手册:
-
Official documentation detailing instructions, registers, and system architecture.详细说明说明、寄存器和系统架构的官方文档。
-
Available from Arm Developer: developer.arm.com/documentation/ddi0487/latest/可从 Arm Developer 获得:developer.arm.com/documentation/ddi0487/latest/
Armv8-A Instruction Set Quick Reference:Armv8-A 指令集快速参考:
-
A concise guide to AArch64 instructions.AArch64 说明的简明指南。
-
Available in the Arm documentation or community resources like ARMv8-A Instruction Set.在 Arm 文档或社区资源(如 ARMv8-A 指令集)中提供。
GICv3 Architecture Specification:GICv3 架构规范:
-
Details the Generic Interrupt Controller used here.详细介绍 Generic Interrupt Controller 此处使用。
-
See Arm’s GIC documentation: developer.arm.com/documentation/ihi0069/latest/请参阅 Arm 的 GIC 文档:developer.arm.com/documentation/ihi0069/latest/
GNU Assembler (GAS) Manual:GNU Assembler (GAS) 手册:
-
Syntax reference for the assembler used in this code.此代码中使用的汇编程序的语法参考。
-
sourceware.org/binutils/docs/as/
Cortex-A Series Programmer’s Guide:Cortex-A 系列程序员指南:
-
Practical examples for Cortex-A processors (e.g., Cortex-A57).Cortex-A 处理器(例如 Cortex-A57)的实际示例。
-
developer.arm.com/documentation/den0013/d/