Understanding ARMv8-A Alignment Support for Data Access

If unaligned access fails, consider the following issues:

  • Does the current architecture support unaligned data access?
  • Is the alignment check feature enabled in the system control register SCTLR.A?
  • Do the instructions used support unaligned access?
  • Do the objects being operated on (SP, PC, Normal memory, Device memory) support unaligned access?
  • Is the current system using big-endian or little-endian?

1. Aligned and Unaligned Transfers

Refer to the authoritative guide for Cortex-M3 and Cortex-M4 Chapter 6.6:

Since the memory system is 32-bit (at least from the programming model perspective), sizes of 32 bits (4 bytes, or 1 word) or 16 bits (2 bytes, or half-word) can be either aligned or unaligned.

Aligned transfer means that the address value is a multiple of the size (in bytes). For example, addresses for word-aligned transfers can be 0x00000000, 0x00000004, …, 0x00001000, 0x00001004, etc.; similarly, for half-word aligned transfers, the addresses can be 0x00000000, 0x00000002, …, 0x00001000, 0x00001002, etc.

Examples of aligned and unaligned transfers are shown in the figure below.

Understanding ARMv8-A Alignment Support for Data Access

Generally, most classic ARM processors (such as ARM7 / ARM9 / ARM10) only allow aligned transfers. This means that for memory access, the address bits [1] and [0] for word transfers must be 0, while the address bit [0] for half-word transfers must be 0. For example, word data can be located at 0x1000 or 0x1004, but not at 0x1001, 0x1002, or 0x1003; for half-word data, the address can be 0x1000 or 0x1002, but not 0x1001. All byte transfers are aligned.

Cortex-M3 and Cortex-M4 processors support unaligned data transfers for normal memory access (such as LDR, LDRH, STR, and STRH instructions).

There are also some restrictions:

  • Multiple load/store instructions do not support unaligned transfers.
  • Stack operation instructions (PUSH/POP) must be aligned.
  • Exclusive access (such as LDREX or STREX) must be aligned; otherwise, it will trigger an error exception (usage error).
  • Bit-field operations do not support unaligned transfers as their results are unpredictable.

When unaligned transfers are initiated by the processor, they are actually converted by the processor’s bus interface unit into multiple aligned transfers. This conversion is invisible, so application developers do not need to consider this issue.

However, when unaligned transfers occur, they are split into several aligned transfers, thus this data access will take more clock cycles, which may be detrimental in high-performance scenarios. For higher performance, ensuring data is properly aligned is necessary.

In most cases, C compilers do not generate unaligned transfers; they only occur in the following situations:

  1. Directly manipulating pointers.
  2. Data structures containing unaligned data have the “_packed” attribute.
  3. Inline/embedded assembly code.

2. AArch32 Alignment Support

2.1 Instruction Alignment

A32 instructions are word-aligned.

T32 instructions are half-word aligned.

2.2 Unaligned Data Access

In ARM A series implementations, unaligned data access to Normal memory is supported through some Load/Store instructions. For details on Normal memory and device memory, refer to the blog: Introduction to ARMv8 Memory Attributes and Types (Memory types and attributes) by SOC Luo Sanpao on CSDN.

As shown in the red box in the figure below, some Load/Store instructions can achieve unaligned access, such as the commonly used LDR and STR instructions. Of course, the premise is that the alignment check bit in the system control register SCTLR is not enabled, i.e., SCTLR.A = 0:

  • By setting the SCTLR.A bit, alignment access can be controlled in all modes except Hyp mode.
  • By setting HSCTLR.A bit, alignment access in Hyp mode can be controlled.

Understanding ARMv8-A Alignment Support for Data Access

Any unaligned access to Device memory will generate an alignment exception.

2.3 SCTLR.A Alignment Check Enable

The SCTLR.A bit controls the system’s unaligned access to Normal memory. In PL0 or PL1, alignment errors are checked:

  • SCTLR.A = 0, reset value, disables alignment error checking. In PL0 or PL1, it will not check whether the size of the data element being accessed aligns with the address when Load/Store instructions operate on one or more registers.
  • SCTLR.A = 1, enables alignment error checking. In PL0 or PL1, it will not check whether the size of the data element being accessed aligns with the address when Load/Store instructions operate on one or more registers. If an unaligned access is detected, it will generate a Data Abort exception.

Additionally, Load/store exclusive and load-acquire/store-release instructions inherently have alignment checks, so they will ignore the value of SCTLR.A.

3. AArch64 Alignment Support (B2.5)

3.1 Instruction Alignment

A64 instructions are word-aligned.

3.2 Alignment of Data Accesses

Similar to A32, any unaligned access to Device memory will cause an alignment error, resulting in a Data Abort exception.

For unaligned access to Normal memory, its behavior depends on:

  • The instruction for memory access (e.g., load, store)
  • The memory properties of the accessed memory (e.g., Normal or Device)
  • The value of SCTLR_ELx.{A, nAA}, whether alignment access checks are enabled.
  • Whether the FEAT_LSE2 attribute is implemented.

3.3 Ordinary Load and Store Instructions (including single and multiple registers)

For ordinary Load and Store instructions (not Exclusive access, Atomic, SETG* Memory Copy, and Memory Set instructions), whether single register operations or multiple register operations, if the accessed address does not align with the size of the data element being accessed (unaligned access), then:

  • If SCTLR_ELx.A = 1, an alignment error will occur.
  • If SCTLR_ELx.A = 0, unaligned access (Normal memory) will be executed.

For ordinary Load and Store instructions, the definition of unaligned access is based on the size of the accessed elements, not the size of the entire memory access.

3.4 Load-Exclusive/Store-Exclusive and Atomic Instructions

For Load-Exclusive/Store-Exclusive and Atomic instructions, if SCTLR_ELx.A = 1, an alignment error will occur.

If SCTLR_ELx.A = 0, it depends on FEAT_LSE2; specific analysis can be found in the document: DDI0487G_a_armv8_arm.pdf.

If the FEAT_LSE2 feature is not implemented, if the access is to an address that is not aligned with the size of the data structure, these instructions will generate an alignment error.

3.5 Non-atomic Load-Acquire/Store-Release Instructions

For non-exclusive or atomic Load-Acquire/Store-Release instructions,

If SCTLR_ELx.A = 1, an alignment error will occur.

If SCTLR_ELx.A = 0, if the FEAT_LSE2 feature is not implemented, if the access is to an address that is not aligned with the size of the data structure, these instructions will generate an alignment error.

3.6 FEAT_LSE2, Large System Extensions v2

FEAT_LSE2 introduces single-copy atomicity requirements and alignment access requirements for load and store operations.

This feature is supported in AArch64 state only. This feature is OPTIONAL in Armv8.2 implementations. This feature is mandatory in Armv8.4 implementations.

You can check whether this attribute is implemented through the AT bit of ID_AA64MMFR2_EL1.

Compared to aligned access, unaligned access typically requires additional cycles to complete.

3.7 Aligning Access to SP Register

The 64-bit wide Stack Pointer register requires the stack pointer to be 16 bytes aligned.

When the stack pointer is used as a base for calculations, regardless of any offset applied by the instruction, if the [3:0] bits of the stack pointer are not 0b0000, this is an unaligned stack pointer. The processor can be configured so that if a load/store instruction uses an unaligned stack pointer, the processor will generate a stack pointer unaligned exception.

The pseudocode is as follows:

Understanding ARMv8-A Alignment Support for Data Access

By checking whether the SA0 or SA bits of SCTLR are 0, it is determined whether to generate a stack pointer unaligned exception.

3.8 Aligning Access to PC Register

The 64-bit wide Program Counter register holds the address of the currently executing instruction. If an A64 instruction is executed that is not word-aligned, a PC alignment error will occur.

The PC alignment check will generate a PC alignment error exception related to instruction fetching. In AArch64 state, attempting to execute an instruction from an architecture that was fetched with an unaligned PC will result in a PC alignment error. An unaligned PC means that the [1:0] bits of the PC are not 0b00, which means that the address needs to end in 0, 4, 8, or c; for example, the PC can be 0x1000 or 0x1004, but not 0x1001, 0x1002, or 0x1003.

A PC unaligned exception will write 0x22 to the EC bit of the Exception Syndrome Register (ESR) and will generate a series of errors.

The pseudocode for checking PC unaligned exception is as follows:

Understanding ARMv8-A Alignment Support for Data Access

This article is reprinted from CSDN, author: SOC Luo Sanpao, the article has been authorized by the original author.

Leave a Comment