1、Embedded Processor Basics
A typical microprocessor consists of a control unit, program counter (PC), instruction register (IR), data channels, memory, etc.
The instruction execution process is generally divided into:
Fetch: Obtain the next instruction to be executed from memory into the instruction register; PC: Program Counter, always points to the next instruction to be executed; IR: Instruction Register, used to hold the fetched instruction; as shown:
Decode: Interpret the instruction and determine the meaning of the instruction; as shown:
Execute: Move data from memory to the data channel register; perform data operations through the Arithmetic Logic Unit (ALU); as shown:
Store: Write data from the register to memory. As shown:
On some microprocessors, such as ARM series processors, DSPs, etc., instructions implement pipelining, and the instruction process is divided according to the number of pipeline stages. For example, a 5-stage pipeline processor executes instructions in 5 stages.
(1) By storage structure: Von Neumann architecture and Harvard architecture
The Von Neumann architecture, also known as the Princeton architecture, is a memory structure that combines program instruction memory and data memory. The processor accesses both program and data memory via the same bus, with the same width for program instructions and data. Examples include the X86 series, ARM7, etc., as shown:
The Harvard architecture is a memory structure that separates program instruction storage and data storage, aiming to alleviate memory access bottlenecks during program execution. Microprocessors with Harvard architecture typically have higher execution efficiency, such as Microchip’s PIC series chips, Motorola’s MC68 series, Zilog’s Z8 series, ATMEL’s AVR series, and ARM’s ARM9, ARM10, and ARM11 series. As shown:
By instruction type, it can be divided into: Complex Instruction Set Computer (CISC) processors and Reduced Instruction Set Computer (RISC) processors. CISC: Complex Instruction Set Computer; has a large number of instructions and addressing modes, requiring more interpreters.
80/20 Rule: 80% of programs use only 20% of the instructions;
Most programs can run with a small number of instructions.
CISC has the following significant characteristics: (1) Instruction format is not fixed, instruction length is inconsistent, and the number of operands can vary; (2) Addressing modes are complex and diverse, facilitating program writing; (3) Uses microprogram structure, requiring a microinstruction sequence to execute each instruction; (4) Each instruction requires several machine cycles to complete; the more complex the instruction, the more machine cycles are required. RISC: Reduced Instruction Set Computer: a small number of instructions, only the most useful instructions are included in the channel; short execution time, ensuring the data channel quickly executes each instruction; simplifies CPU hardware structure design; each instruction uses a standard word length.
2、ARM Processor Architecture
ARM stands for Advanced RISC Machines.
On April 26, 1985, the first ARM prototype was born at Acorn Computers Ltd. in Cambridge, UK. In the late 1980s, ARM quickly developed into Acorn’s desktop products, forming the basis of computer education in the UK.
In 1990, Advanced RISC Machines Limited was established.
In the 1990s, ARM’s 32-bit embedded RISC processors expanded globally, occupying a leading position in the low power, low cost, and high performance embedded system application fields.
Currently, it holds over 75% of the 32-bit embedded product market.
32-bit RISC processors are favored, with ARM embedded microprocessor series leading the way.
Although ARM was established only over 20 years ago, in 1999, due to the booming mobile phone market, its 32-bit RISC processors accounted for over 50% market share. By early 2001, ARM’s 32-bit RISC processor market share exceeded 75%. ARM is an intellectual property supplier and design company. Partner companies produce various characteristic chips.
Characteristics of ARM processors: (1) ARM instructions are 32-bit fixed length (except for the AArch64 architecture, which adds 64-bit instructions); (2) Rich in registers (37 registers); (3) Ordinary Load/Store instructions; (4) Multi-register Load/Store instructions; (5) Conditional execution of instructions; (6) Single instruction completion of data shift and ALU operations in a single clock cycle; (7) Functionality of ARM processors is extended through variants and coprocessors; (8) 16-bit Thumb instructions are extended to improve code density. ARM naming rules are roughly divided into two categories: “processor series” naming rules based on ARM Architecture version; “processor model” naming rules based on ARM Architecture version.
ARMv6 architecture introduces a series of new features including Single Instruction Multiple Data (SIMD) operations.
ARMv6-M architecture is designed for low-cost, high-performance devices, providing powerful 32-bit solutions to markets previously dominated by 8-bit devices, such as Cortex™-M0 and Cortex-M1. ARMv7 architecture, all ARMv7 architecture processors implement Thumb-2 technology (an optimized 16/32-bit mixed instruction set). This architecture is divided into 3 types of processors: Cortex-A – application processors, Cortex-R – real-time processors, Cortex-M – microcontrollers.
ARMv8 architecture, ARMv8-A introduces 64-bit architecture support into the ARM architecture, including: 64-bit general-purpose registers, SP (Stack Pointer) and PC (Program Counter), 64-bit data processing and extended virtual addressing, compatible with 32-bit processing.
ARMv9 architecture, the most significant upgrade is in AI and security, enhancing security based on ARMv8, adding vector computing, machine learning, and data signal processing capabilities, with performance improvements expected to be substantial. 1)ARM Data Types (1) Double-Word: 64 bits (2) Word: In ARM architecture, the length of a word is 32 bits. (3) Half-Word: In ARM architecture, the length of a half-word is 16 bits. (4) Byte: In ARM architecture, the length of a byte is 8 bits.
2)ARM Processor Storage Format
As a 32-bit microprocessor, the maximum addressing space supported by the ARM architecture is 4GB.
ARM architecture can store word data using two methods, namely big-endian and little-endian modes.
Big-endian mode: The high byte of the word is stored in the low address byte unit, and the low byte of the word is stored in the high address byte unit.
3)ARM Processor Operating States
From a programming perspective, ARM microprocessors generally have two operating states: ARM and Thumb, and can switch between the two states.
(1) ARM state: The processor executes 32-bit word-aligned ARM instructions, with most work performed in this state. (2) Thumb state: The processor executes 16-bit half-word aligned Thumb instructions.
THUMB instruction characteristics: THUMB code requires 70% of the space of ARM code; THUMB code uses 40% more instructions than ARM code; with 32-bit memory, ARM code is 40% faster than THUMB code; with 16-bit memory, THUMB code is 45% faster than ARM code; using THUMB code, external memory power consumption is 30% less than ARM code. 4)ARM Processor Operating Modes
5)ARM Cortex-A Processor Operating Modes
6)Cortex-A Register Set
34 general-purpose registers, including R0-R14 in various modes and the shared R15 program counter (PC), all of which are 32 bits. 8 status registers, with one ELR_Hyp register unique to Hyp mode.
7)Program Status Registers CPSR and SPSR
Like other processors, ARM has program status storage to configure processor operating modes and display operating states. ARM processors have two program status registers: CPSR (Current Program Status Register) and SPSR (Saved Program Status Register).
CPSR can be accessed in any operating mode and includes condition flags, interrupt disable bits, current processor mode flags, and other related control and status bits.
Each operating mode has a dedicated physical status register, called SPSR for the status register.
(1) N (Negative): When performing operations with signed numbers represented in two’s complement, N=1 indicates the result is negative, and N=0 indicates the result is positive or zero. (2) Z (Zero): Z=1 indicates the operation result is 0, and Z=0 indicates the operation result is non-zero. (3) C (Carry): There are 4 ways to set the value of C:
1) Addition instructions (including comparison instruction CMP) 2) When an operation produces a carry (unsigned overflow), C=1, otherwise C=0 3) Subtraction operations (including comparison instruction CMP) 4) When an operation produces a borrow (unsigned overflow), C=0, otherwise C=1
For non-add/subtract operation instructions that include shift operations, C is the last bit of the value shifted out. For other non-add/subtract operation instructions, the value of C usually remains unchanged.
(4) V (Overflow): There are 2 ways to set the value of V: 1) For add/subtract operation instructions, when the operands and the operation result are signed numbers represented in binary’s two’s complement, V=1 indicates overflow of the sign bit. 2) For other non-add/subtract operation instructions, the value of V usually remains unchanged.
(5) I (Interrupt Request): I=1 indicates interrupts are disabled, I=0 indicates interrupts are enabled. (6) F (Fast Interrupt Request): F=1 indicates fast interrupts are disabled, F=0 indicates fast interrupts are enabled. (7) T (Thumb): T=0 indicates the current status is ARM state, T=1 indicates Thumb state. (8) M4-M0: Indicates the current processor operating mode.
8)Conditions for Switching Operating Modes
(1) Execute software interrupt (SWI) or reset command (Reset) instructions. If the SWI instruction is executed in user mode, the CPU enters supervisor mode. (2) An external interrupt occurs. If an external interrupt occurs, the CPU will enter IRQ or FIQ mode. (3) An exception occurs during CPU execution. The most typical exception is a memory access exception due to MMU protection, at which point the CPU will switch to Abort mode. If it is an invalid instruction, it will enter Undefined mode. (4) There is a mode that the CPU cannot enter automatically, which is the System mode; to enter the System mode, the programmer must write instructions to implement it. To enter the System mode, simply change the CPSR mode bit to the corresponding mode bit for System mode. (5) In any privileged mode, other modes can be entered by modifying the CPSR’s MODE field. However, it should be noted that the modified CPSR is the shadow CPSR in that mode, i.e., SPSR, and not the actual CPSR, so the general practice is to modify the shadow CPSR and then execute a MOVS instruction to restore execution to a breakpoint and switch to the new mode. 3、ARM Processor Memory Management 1)What is Memory Mapping
Memory mapping refers to the mapping from virtual addresses to actual physical addresses in the ARM storage system using the Memory Management Unit (MMU), as shown.
2)Why Memory Mapping
The A32 architecture of ARM has a 32-bit address bus, so the CPU’s addressable range is 0x00000000~0xffffffff, with an addressing space of 4GB. All internal and external storage or peripheral units need to be operated through corresponding addresses. The types and quantities of different chip peripherals vary in addressing space. To allow the kernel to manage different chip designs more conveniently, the ARM kernel provides predefined memory mappings.
Chip design companies need to define the internal peripherals of the chip and the external reserved interfaces according to the predefined memory mapping provided by the kernel. This approach greatly reduces the hassle of address conversion between different chips with the same kernel (the CPU operates on a unified virtual address, and the actual physical address is managed by the MMU).
3)Bit-Band Operations
(1) What are Bit-Band Operations For example, when using a 51 microcontroller to operate P1.0 at a low level, we know that this process actually involves writing 1 or 0 to a specific bit in a certain register. However, in the process of CPU operations, each address corresponds to an 8-bit byte. How to achieve direct operations on a specific bit requires the help of bit-band operations.
(2) Which addresses can perform bit-band operations? The diagram above shows two areas where bit-band operations are implemented. One is the lowest 1MB range of the SRAM area (Bit band region), and the second is the lowest 1MB range of the on-chip peripheral area. 4)Register Address Calculation
In ARM, all peripheral addresses are generally mounted on the AHB or APBx bus. Therefore, we often use the base address + offset address + structure method to quickly and clearly calculate the specific register address of a peripheral, as shown.
5)Integrated Peripheral Register Access Methods
👇 Click to read the original text and sign up for the event