5. ARM/Thumb Unified Assembly Language Instructions
This chapter provides an overview of ARM/Thumb assembly language, without detailing each instruction. Individual instruction descriptions can be found in Appendix A ‘Instruction Summary’.
Instructions can generally be categorized into the following types:
-
Data processing operations (e.g., ADD and other ALU operations).
-
Memory access (loading and storing data to memory).
-
Control flow (for loops, goto, conditional codes, and other program flow control).
-
System operations (coprocessor, debugging, mode switching, etc.).
We will briefly introduce each category in turn. Before that, let’s understand the common functionalities shared by these instruction categories.
5.1 Instruction Set Basics
Each part of the instruction set has some common characteristics.
5.1.1 Constant and Immediate Values
ARM or Thumb assembly language instructions are only 16 or 32 bits long, which presents some challenges. This means you cannot encode arbitrary 32-bit values directly in the instruction opcode.
In the ARM instruction set, since opcode bits are used to specify condition codes, the instruction itself, and the registers to be used, only 12 bits are available to specify an immediate value. Therefore, some creative methods are needed for using these 12 bits. Instead of allowing a constant size from -2048 to +2047 to be specified directly, these 12 bits are divided into an 8-bit constant and a 4-bit rotation value. This rotation value allows the 8-bit constant to be rotated by a specified number of bits to the right, with rotation steps from 0 to 30 in increments of 2 (i.e., 0, 2, 4, 6, 8…).
Thus, you can have immediate values like 0x23 or 0xFF. You can also generate other useful immediate values, such as addresses of peripherals or memory blocks. For example, 0x23000000 can be generated by representing it as 0x23 ROR 8 (see page A-35 for ROR). However, many other constants (like 0x3FF) cannot be generated with a single instruction. For these values, you must break them down into multiple instructions or load them from memory. Typically, programmers do not need to worry about these unless the assembler throws an error complaining about an invalid constant. Instead, you can use assembly language pseudo-instructions to generate the desired constant.
In Thumb, the constant values encoded in instructions can be one of the following:
-
Constants generated by rotating an 8-bit value by any even number of bits into a 32-bit word
-
Constants of the form 0x00XY00XY
-
Constants of the form 0xXY00XY00
-
Constants of the form 0xXYXYXYXY
Where XY is a hexadecimal number ranging from 0x00 to 0xFF.
The MOVW instruction (move wide) will move a 16-bit constant into a register while clearing the high 16 bits of the target register. The MOVT instruction (move top) will move a 16-bit constant into the high 16 bits of the specified register without changing the low 16 bits. This allows you to use the pseudo-instruction MOV32 to construct any 32-bit constant. The assembler provides some helper tools, with prefixes :upper16:
and :lower16:
allowing you to extract the corresponding high or low 16 bits from a 32-bit constant:
MOVW R0, #:lower16:label
MOVT R0, #:upper16:label
Although this requires two instructions, it does not require additional space to store the constant or read data from memory.
You can also use pseudo-instructions LDR Rn, =<constant>
or LDR Rn, =label
. This is the only option in older processors that do not have MOVW and MOVT. The assembler will use the best sequence to generate the constant into the specified register (MOV, MVN, or load from the literal pool). The literal pool is an area in the code segment that holds constant data, typically located at the end or beginning of a function. If you need to manually control the location of the literal pool, you can use the assembly directive LTORG
(for armasm) or .ltorg
(for GNU tools). Loading registers may be the program counter, which can lead to a jump.
This is useful for absolute addressing or referencing content outside the current segment; clearly, this will lead to position-dependent code. The value of the constant can be determined by the assembler or linker.
ARM tools also provide related pseudo-instructions ADR Rn, =label
. It uses an ADD or SUB instruction relative to the program counter (PC) to place the address of the label into the specified register with just one instruction. If the address is too far to be generated this way, the ADRL
pseudo-instruction will be used. This requires two instructions but provides a larger range. This can be used to generate addresses for position-independent code, but can only be used within the same code segment.
5.1.2 Conditional Execution
A feature of the ARM instruction set is that almost all instructions can be conditionally executed. In most other architectures, only branch or jump instructions can perform conditional operations. This is useful for avoiding conditional branches in small if/then/else structures or compound comparisons.
For example, consider the following code, which finds the smaller value in registers R0 and R1 and places the result in R2, as shown in the following example. The suffix LT indicates that the instruction executes only if the most recent status flag instruction returns less than; GE indicates greater than or equal to.
Now let’s look at the same code written using conditional MOV instructions instead of branches, as shown in the following example. This code is not only shorter but may execute faster on earlier ARM processors. However, on processors like Cortex-A9, this code may actually be slower since dependencies between instructions may cause longer stalls than branches, whereas branch prediction can reduce or eliminate the cost of branches.
It is worth noting that this programming style relies on certain instructions being able to conditionally set status flags. If the MOVGE instruction in the above example automatically sets flags, the program may not run correctly. Load and store instructions never set flags. For data processing operations, you have a choice. By default, flags remain unchanged during the execution of these instructions. If the instruction has an S suffix (e.g., MOVS instead of MOV), the instruction will set flags. For explicit comparison instructions, the S suffix is neither required nor allowed. Status flags can also be set manually using the dedicated PSR (program status register).
Operation instructions (MSR). Some instructions set the carry flag (C) based on the carry from the ALU, while others set it based on the carry from the barrel shifter (the barrel shifter moves data words by a specified number of bits within a clock cycle).
The Thumb-2 technology also introduces an If-Then (IT) instruction that allows conditional execution for up to four consecutive instructions. These conditions can be the same or some can be other inverse conditions. Instructions in the IT block must also specify the condition codes to be applied.
IT is a 16-bit instruction that uses condition code suffixes, allowing almost all Thumb instructions to be conditionally executed based on the values of ALU flags. The syntax of the instruction is IT{x{y{z}}}, where x, y, and z specify the condition switching for optional instructions in the IT block, representing Then (T) or Else (E), for example, ITTET.
For example:
ITT EQ
SUBEQ r1, r1, #1
ADDEQ r0, r0, #60
Typically, IT instructions are automatically generated by the assembler rather than being written manually. The 16-bit instructions that change condition codes will not do so within the IT block, except for CMP, CMN, and TST, which only serve to set flags. There are some restrictions on the use of instructions within the IT block. Exceptions may occur within the IT block, and the current if-then state is stored in the CPSR, which is then copied to the SPSR when an exception occurs, allowing the execution of the IT block to be correctly restored when the exception returns.
Some instructions always set flags with no other effect. They are CMP, CMN, TST, and TEQ, similar to SUBS, ADDS, ANDS, and EORS, but the results of ALU calculations are only used to update flags and not placed in registers.
The table below lists the 15 condition codes that can be attached to most instructions.
5.1.3 Status Flags and Condition Codes
The ARM processor has a current program status register (CPSR), which contains four status flags: zero flag (Z), negative flag (N), carry flag (C), and overflow flag (V). The table below shows the values of these flags during flag setting operations.
If the result of an unsigned operation overflows the 32-bit result register, the carry flag (C) bit will be set. For example, this bit can be used to implement 64-bit (or longer) arithmetic operations using 32-bit operations.
The overflow flag (V) works similarly to the C bit, but applies to signed operations. 0x7FFFFFFF is the maximum signed positive integer representable in 32 bits. For example, if you add 2 to this value, the result will be 0x80000001, a large negative number. The V bit is set to indicate an overflow or underflow from bit [30] to bit [31].
5.2 Data Processing Operations
These are essentially core arithmetic and logical operations. They typically have slightly different formats and rules and are executed in dedicated units within the core.
ARM cores can only perform data processing on registers, not directly on memory. Data processing instructions (in most cases) use one destination register and two source operands. The basic format can be viewed as an opcode, optionally followed by a condition code, and optionally followed by an S (set flags), formatted as follows:
Operation{cond}{S} Rd, Rn, Operand2
The table below summarizes data processing assembly language instructions, listing their mnemonic opcodes, operands, and a brief description of their functionality. Appendix A provides a more detailed description of all available instructions.
For most programmers, the purposes and functionalities of these instructions will be evident, but some instructions require further explanation.
In arithmetic operations, note that the move operation instructions MOV and MVN require only one operand (this operand is treated as operand 2 for maximum flexibility, as we will see later). RSB performs reverse subtraction operations—that is, it subtracts the first operand from the second operand. This instruction is necessary because the flexibility of the first operand is poor—it can only be a register value. Therefore, to write R0 = 100 – R1
, you must use RSB R0, R1, #100
, since SUB R0, #100, R1
is an illegal instruction.
The ADC and SBC instructions perform addition and subtraction operations with carry. This allows you to perform arithmetic operations on values larger than 32 bits.
Logical operations are essentially the same as the corresponding C language operators. Note the use of ORR instead of OR, as the original ARM instruction set uses three-letter abbreviations for all data processing operations. The BIC instruction performs an AND operation between a register and the inverted value of operand 2. For example, if you want to clear bit [11] of register R0, you can use the instruction BIC R0, R0, #0x800
.
The second operand 0x800 has only bit [11] set to 1, while all other bits are 0. The BIC instruction inverts this operand, setting all bits to logical 1 except for bit [11]. It then performs an AND operation with the value in R0, effectively clearing bit [11], and the result will be written back to R0.
Comparison and test instructions modify the CPSR (with no other effects).
5.2.1 Operand 2 and the Barrel Shifter
The first operand of all data processing operations must always be a register. The second operand is much more flexible and can be an immediate value (#x), a register (Rm), or a register ‘Rm’ shifted by an immediate value or register ‘Rm, shift #x’ or ‘Rm, shift Rs’. There are five types of shift operations: logical left shift (LSL), logical right shift (LSR), arithmetic right shift (ASR), rotate right (ROR), and rotate right extended (RRX).
The right shift operation creates vacancies in the high bits of the register. In this case, it is necessary to distinguish between logical shifts (inserting 0 into the highest bit) and arithmetic shifts (filling vacancies with the sign bit of the register’s bit [31]). Therefore, ASR operations are generally used for signed values, while LSR is used for unsigned values. There is no such distinction for left shift operations; left shifts always insert 0 into the least significant bit.
Thus, unlike many assembly languages, ARM assembly language does not require explicit shift instructions. Instead, you can achieve shifts and rotations using the MOV instruction. For example, R0 = R1 >> 2
can be achieved by MOV R0, R1, LSR #2
. Similarly, shifts are typically combined with ADD, SUB, or other instructions. For example, to multiply R0 by 5, you could write:
ADD R0, R0, R0, LSL #2
Left shifting n bits effectively equates to multiplying by 2 to the power of n, so the above operation effectively writes R0 = R0 + (4 × R0)
. Right shifts provide the corresponding division operation, although ASR differs from division in C language when rounding negative values.
Besides multiplication and division, another common use of shift operands is in array index lookups. Consider the case where R1 points to the base address of a 32-bit integer array and R2 is the index pointing to the nth element of that array. You can use a single load instruction to get the appropriate address by calculating R1 + (R2 × 4)
. The following example provides different types of Operand 2 used in ARM instructions.
5.2.2 Multiplication Operations
Multiplication operations are very easy to understand. One key limitation to note is that multiplication operations cannot directly use immediate values. Multiplication can only operate on values in registers. To perform multiplication with a constant, you may need to load that constant into a register first.
Newer versions of ARM processors have added more multiplication instructions, providing multiple possibilities for 8-bit, 16-bit, and 32-bit data. We will consider these when discussing DSP instructions in integer SIMD instructions.
The table below summarizes multiplication assembly language instructions, listing their mnemonic opcodes, operands, and a brief description of their functionality.
5.2.3 Additional Multiplies
Multiplication operations provide a way to multiply one 32-bit register with another 32-bit register to produce a 32-bit result or a 64-bit signed or unsigned result. In all cases, you can choose to accumulate a 32-bit or 64-bit value into the result. ARM has also added additional multiplication instructions, including signed high word multiplication instructions: SMMUL, SMMLA, and SMMLS. These instructions perform 32×32 bit multiplication, with the result being the high 32 bits of the product, while the low 32 bits are discarded. The result can be rounded by adding the suffix R; otherwise, it will be truncated. The UMMAL (unsigned multiply accumulate long) instruction performs 32×32 bit multiplication and adds the contents of two 32-bit registers.
5.2.4 Integer SIMD Instructions
Single Instruction Multiple Data (SIMD) instructions were first introduced in the ARMv6 architecture, providing the capability to pack, extract, and unpack 8-bit and 16-bit data within a 32-bit register, and to perform various arithmetic operations on this packed data, such as addition, subtraction, comparison, or multiplication, all in a single instruction. These instructions should not be confused with the following.
More powerful advanced SIMD (NEON) operations were introduced in the ARMv7 architecture and are detailed in Chapter 7 and the ‘ARM® NEON™ Programmer’s Guide’.
Integer Register SIMD Instructions
ARMv6 SIMD operations use the GE (greater than or equal) flag in the CPSR (current program status register). These flags are different from the regular condition flags. Each byte position in a word has a corresponding flag. Regular data processing operations produce a result and set the N, Z, C, and V flags. SIMD operations can produce up to four outputs and only set the GE flag to indicate overflow. MSR and MRS instructions can be used to directly write to or read these flags.
The general form of SIMD instructions is that the number of subbytes in each register will operate in parallel (for example, four byte-sized ADD operations can be performed), and the GE flag is set or cleared based on the results of the instruction. Different types of addition and subtraction operations can be specified using appropriate prefixes. For example, QADD16 performs saturated addition on half words within registers. SADD/UADD8 and SSUB/USUB8 independently set the GE flag, while SADD/UADD16 and SSUB/USUB16 set the GE bits [3:2] together based on the upper half-word result and set the GE bits [1:0] together based on the lower half-word result.
ASX and SAX instruction categories are also provided, which can reverse one operand’s half-word and operate on pairs of add/subtract or subtract/add in parallel. Like the ADD and SUB instructions described earlier, these instructions have unsigned (UASX/USAX), signed (SASX/SSAX), and saturated (QASX/QSAX) versions.
Absolute Difference Sum
Calculating the absolute difference sum is a key operation in motion vector estimation components of common video codecs, typically performed on pixel data arrays. It calculates the absolute difference sum of bytes within a word in registers Rn and Rm, adds the value stored in Ra, and stores the result in Rd.
Data Packing and Unpacking
Data packing is common in many video and audio codecs (video data is often represented as packed arrays of 8-bit pixel data, and audio data may use packed 16-bit samples), and is frequently encountered in network protocols. Before the additional instructions were added in the ARMv6 architecture, this data had to be loaded using LDRH and LDRB instructions, or loaded by word and then unpacked using shifts and bit-clearing operations; both methods are relatively inefficient. The packing (PKHBT, PKHTB) instructions allow extracting 16-bit or 8-bit values from arbitrary positions in a register and packing them into another register. Unpacking instructions (UXTH, UXTB, and many variants including signed and addition operations) can extract 8-bit or 16-bit values from arbitrary bit positions of a register.
This allows efficiently loading packed data sequences from memory using word or double-word loads, unpacking them into separate register values for processing, and then packing them back into registers for efficient writing back to memory.
Byte Selection
The SEL instruction allows you to select each result byte from the corresponding byte of the first or second operand based on the value of the GE[3:0] bits in CPSR. Saturating arithmetic operations will set these bits after addition or subtraction operations, and the SEL instruction can then be used to extract portions of the data—for example, to find the smaller of two bytes at each position.
5.3 Memory Instructions
ARM cores only perform arithmetic logic unit (ALU) operations on registers. The only supported memory operations are loading (reading data from memory into registers) or storing (writing data from registers into memory). LDR and STR instructions can also be conditionally executed like other instructions.
You can specify the size of the loading or storing transfer by appending B (byte), H (half-word), or D (double-word, 64-bit) after the instruction, e.g., LDRB. For load operations, you can also use an additional S to indicate signed bytes or half-words (SB for signed byte, SH for signed half-word). See the relevant content for examples. This method is useful because when loading 8-bit or 16-bit data into a 32-bit register, you must decide how to handle the most significant bits of the register. Unsigned numbers will be zero-extended (i.e., the upper 16 or 24 bits of the register will be set to zero), but for signed numbers, the sign bit (the 7th bit of the byte or the 15th bit of the half-word) must be copied to the upper 16 bits (or 24 bits) of the register.
5.3.1 Addressing Modes
Load and store operations can use various addressing modes. The numbers in parentheses correspond to the following examples:
-
Register Addressing – The address is stored in a register (1).
-
Pre-indexed Addressing – Adds an offset to the base register before memory access. Its basic form is
LDR Rd, [Rn, Op2]
. The offset can be positive or negative and can be an immediate number or another register with optional shifting (2), (3). -
Pre-indexed with Writeback – Indicated by adding an exclamation mark (!) after the instruction. After memory access, the base register will be updated by adding the offset value (4).
-
Post-indexed with Writeback – In this case, the offset value is written after the square brackets. The address of the base register is used for memory access, and the offset is added to the base register after memory access (5).
5.3.2 Multiple Transfers
Load and Store Multiple instructions allow multiple words to be read from or written to memory consecutively. These instructions are very useful for stack operations and memory copying. Only word values can be operated in this way, and addresses must be word-aligned.
The operands include a base register (an optional exclamation mark ‘!’ indicates writeback of the base register), along with a list of registers enclosed in curly braces. The register list is separated by commas, and ranges are represented with a hyphen. The order of loading or storing registers does not depend on the order specified in the list. Instead, operations are performed in a fixed manner, with the lowest-numbered register always mapping to the lowest address.
For example:
LDMIA R10!, { R0-R3, R12 }
This instruction reads the values of five registers from the address pointed to by register R10, and since a writeback operation is specified, R10 will be incremented by 20 (5 × 4 bytes) at the end.
The instruction must also specify how to operate from the base register Rd. The four possible ways are: IA/IB (post-increment/pre-increment) and DA/DB (post-decrement/pre-decrement). These can also be represented by aliases (FD, FA, ED, and EA) that work from the perspective of the stack, specifying whether the stack pointer points to the last filled position or the empty position, and whether the stack grows up or down in memory.
By convention, in ARM-based systems, the stack uses only the ‘Full Decrement (FD)’ option. This means the stack pointer points to the last filled position in stack memory, and the stack pointer decrements each time a new data item is pushed onto the stack.
For example:
STMFD sp!, {r0-r5} ; Push onto full decrement stack
LDMFD sp!, {r0-r5} ; Pop from full decrement stack
The diagram below illustrates the process of pushing two registers onto the stack. Before executing the STMFD (PUSH) instruction, the stack pointer points to the last occupied word in the stack. After the instruction is executed, the stack pointer decrements by 8 (two words), and the contents of the two registers are written to memory, with the lowest numbered register being written to the lowest memory address.
5.4 Branches
The instruction set provides various types of branch instructions. For simple relative branches (i.e., offsets relative to the current address), use the B instruction. For subroutine calls that need to store the return address in the link register, use the BL instruction.
If a switch between instruction sets is needed (switching from ARM to Thumb or from Thumb to ARM), use the BX or BLX instructions. You can also specify the PC as the target register for normal data processing operations (e.g., ADD or SUB), but this is generally not recommended and is not supported in Thumb mode. Another type of branch instruction can be implemented by loading (LDR) instructions with the PC as the target or by loading multiple (LDM) instructions, or by placing the PC in the list of registers to be loaded with the POP instruction.
The Thumb instruction set has comparison and branch instructions that combine the CMP instruction and conditional branches but do not change the condition code flags in CPSR. This instruction has two opcodes: CBZ (compare and branch to label if Rn is zero) and CBNZ (compare and branch to label if Rn is not zero). These instructions can only branch forward within a range of 4 to 130 bytes. Thumb also has TBB (table branch byte) and TBH (table branch half-word) instructions. These instructions read a value from an offset table (byte or half-word size) and perform a forward PC-relative branch, with the branch offset being twice the byte or half-word value returned from the table. These instructions require specifying the base address of the table in one register and the index in another register.
5.5 Saturating Arithmetic
Saturating arithmetic is commonly used in audio and video codecs. When the computed result exceeds (or falls below) the maximum positive (or negative) value that can be represented, no overflow occurs. Instead, the result will be set to the maximum positive or negative value (i.e., saturated). The ARM instruction set includes many instructions that support such algorithms.
5.5.1 Saturated Arithmetic Instructions
ARM saturated arithmetic instructions can operate on values of byte, word, or half-word size. For example, the 8 in QADD8 and QSUB8 instructions indicates that they operate on byte-sized values. The result of the operation will be saturated to the maximum positive or negative value possible. If the result overflows and is saturated, the overflow flag (the Q bit of CPSR) will be set. This flag is known as the sticky flag; once set, it will remain set until explicitly cleared by writing to CPSR.
The instruction set provides special instructions QSUB and QADD that have this behavior. Additionally, QDSUB and QDADD instructions are used to support Q15 or Q31 fixed-point arithmetic. These instructions double the second operand before performing the specified addition or subtraction and saturate the result.
The Count Leading Zeros (CLZ) instruction returns the number of leading zeros before the highest set bit. This is useful for normalization and certain division algorithms. To saturate a value to a specific bit position (effectively saturating to a power of 2), you can use USAT or SSAT (unsigned or signed) saturation operations. USAT16 and SSAT16 allow saturation operations on the two half-word values contained in a register.
5.6 Miscellaneous Instructions
The remaining instructions cover coprocessor, supervisor calls, PSR modifications, byte reversals, cache prefetching, bit manipulations, and more.
5.6.1 Coprocessor Instructions
Coprocessor instructions occupy a portion of the ARM instruction set. Up to 16 coprocessors can be implemented, numbered from 0 to 15 (CP0, CP1……CP15). These coprocessors can be internal (built into the processor) or connected externally via dedicated interfaces. In older processors, the use of external coprocessors was uncommon and is completely unsupported in the Cortex-A series.
-
Coprocessor 15 is a built-in coprocessor that provides control over many core functions, including cache and MMU.
-
Coprocessor 14 is a built-in coprocessor that controls the core’s hardware debugging facilities, such as breakpoint units.
-
Coprocessors 10 and 11 provide access to floating-point and NEON hardware in the system.
If a coprocessor instruction is executed but the corresponding coprocessor does not exist in the system, an undefined instruction exception will occur.
Coprocessor instructions fall into five categories:
-
CDP – Starts a coprocessor data processing operation.
-
MRC – Moves from a coprocessor register to an ARM register.
-
MCR – Moves from an ARM register to a coprocessor register.
-
LDC – Loads from memory to a coprocessor register.
-
STC – Stores from a coprocessor register to memory.
These instructions also have multi-register and other variants:
-
MRRC – Transfers values from the coprocessor to a pair of ARM registers.
-
MCCR – Transfers a pair of ARM registers to a coprocessor.
-
LDCL – Reads coprocessor registers from multiple registers.
-
STCL – Writes coprocessor registers to multiple registers.
5.6.2 SVC
The SVC (supervisor call) instruction triggers a supervisor call exception upon execution. This instruction contains a 24-bit (ARM) or 8-bit (Thumb) value that can be checked by the SVC handler code. Through the SVC mechanism, the operating system can specify a set of privileged operations that applications running in user mode can request. This instruction was originally known as SWI (software interrupt).
5.6.3 PSR Modification
Several instructions can perform read and write operations on the PSR:
-
MRS transfers the value of CPSR or SPSR to a general-purpose register. MSR transfers the value of a general-purpose register to CPSR or SPSR. The entire status register or part of it can be updated. In user mode, all bits can be read, but only the condition flag (_f) bits can be modified.
-
In privileged mode, the Change Processor State (CPS) instruction can directly modify the mode and interrupt enable or disable (I and F) bits in CPSR.
-
The SETEND instruction modifies a bit E (byte order) in CPSR. It can be used in systems with mixed byte order data to temporarily switch between little-endian and big-endian data access.
5.6.4 Bit Manipulation
Some instructions can perform bit operations on values in registers:
-
BFI (Bit Field Insert) instruction allows inserting a contiguous range of bits from the bottom of one register into a specified position in the target register.
-
BFC (Bit Field Clear) instruction allows clearing a contiguous range of bits in a register.
-
SBFX and UBFX instructions (signed and unsigned bit field extract) copy a contiguous range of bits from one register to the least significant bits of another register, performing sign extension or zero extension as needed up to 32 bits.
-
RBIT instruction reverses the order of all bits in a register.
5.6.5 Cache Preload
Two instructions are provided: PLD (data cache prefetch) and PLI (instruction cache prefetch). These two instructions serve as hints to indicate that the memory system will soon access the specified address. If the implementation does not support this prefetch operation, it will treat the prefetch as NOP. Any illegal addresses specified as parameters to the PLD instruction will not cause data abort exceptions.
5.6.6 Byte Reversal
Byte reversal instructions are useful for handling endianness or other data reordering operations:
-
REV instruction reverses the byte order in a word.
-
REV16 instruction reverses the byte order in each half-word in a register.
-
REVSH instruction reverses the bottom two bytes and sign-extends them to 32 bits.
The diagram below shows the operation of the REV instruction, displaying the arrangement of the four bytes in the register after reversal.
5.6.7 Other Instructions
Several other instructions are available:
-
Breakpoint Instruction (BKPT) causes a prefetch abort or puts the kernel into debugging mode (depending on whether the processor is configured for monitor mode or pause mode debugging). This instruction is used by debuggers.
-
Wait for Interrupt (WFI) puts the kernel into a sleep mode, stopping execution until interrupted or awakened by a debugging event. If WFI is executed with interrupts disabled, the interrupt will still wake the kernel but will not trigger an interrupt exception. The kernel will continue executing the instructions after WFI. In earlier ARM processors, WFI was implemented as a CP15 operation.
-
No Operation (NOP) does nothing. Execution time is not guaranteed, so NOP instructions should not be used to insert timing delays in code. Its purpose is to serve as padding.
-
Wait for Event (WFE) instruction puts the kernel into sleep mode similarly to WFI. The kernel will enter a sleep state until awakened by another kernel-generated event executing the REV instruction. Interrupts or debugging events will also wake the kernel.
-
Send Event (SEV) instruction is used to generate wake-up events that may wake other cores in the cluster.
Related Links Cortex-A:
[ARM Chinese Manual] Chapter 1 Introduction
[ARM Chinese Manual] Chapter 2 ARM Architecture and Processors
[ARM Chinese Manual] Chapter 3 ARM Processor Modes and Registers
[ARM Chinese Manual] Chapter 4 Introduction to Assembly Language
……
Related Links Cortex-R:
‘ARM Cortex-R Learning Guide’ – [Chapter 2] – ARM Architecture and Processors
‘ARM Cortex-R Learning Guide’ – [Chapter 3] – ARM Processor Modes and Registers
‘ARM Cortex-R Learning Guide’ – [Chapter 4] – Introduction to Assembly Language
‘ARM Cortex-R Learning Guide’ – [Chapter 5] – Unified Assembly Language Instructions
‘ARM Cortex-R Learning Guide’ – [Chapter 6] – Floating Point Numbers
……
Classic Courses:
-
‘Secure Boot from Beginner to Expert Training Camp ‘
-
‘Armv8/Armv9 Architecture from Beginner to Expert (Three Sessions)’
-
‘TrustZone/TEE Standard Version – 48 Lessons/19.5h ‘
-
‘TrustZone/TEE High Configuration Version – 205 Lessons/50h’
-
‘OP-TEE System Architecture from Beginner to Expert’
-
‘Coresight/Trace/Debug Comprehensive Collection
-
Course Directory: Introduction to Six Major VIP Courses, Buying Anything Is Not As Good As Buying VIP
-
‘Arm Selected – Platinum VIP Course – Total 815 Lessons+, Total Duration 320h+, Total Value 30k+