Introduction
The author has been involved in embedded software development for nearly five years, primarily using microcontrollers from the ARM Cortex M series. During these five years, thanks to the presence of C language compilers, I have been able to develop without touching assembly language, but it seems I have missed some scenery, not delving into the beauty of compilers and CPUs. Therefore, I decided to explore the wonderful ARM CPU architecture and the secrets of C language compilers during my boring weekend downtime by seeking information, conducting hands-on experiments, and drawing conclusions. (Because I personally do not agree with the teaching methods of microcomputer principles courses in schools.)
1. ARM CPU Architecture
The ARM CPU architecture[1] is a family of Reduced Instruction Set Computing (RISC) architectures for computer processors. It is the most widely used processor architecture in the world, with billions of ARM-based devices shipped annually, ranging from sensors and wearable devices to smartphones and supercomputers.
The ARM CPU architecture is based on the RISC instruction set and includes:
-
A unified register file, where instructions are not limited to specific registers; -
A load/store architecture, where data processing only operates on register contents rather than directly on memory contents; -
Simple addressing modes, where all load or store modes are determined solely by register contents and instruction fields.
Depending on different application scenarios, the ARM CPU architecture is divided into:
Architecture Definition | Use Cases | Implementation (Processor Cores) |
---|---|---|
A Series | Complex computer applications (servers, network devices, smartphones, TVs) | Cortex-A, Neoverse |
R Series | Used in scenarios requiring real-time responses (strict security applications, applications requiring deterministic responses, autonomous driving) | Cortex-R |
M Series | Devices where power consumption and size are relatively important, especially embedded devices and IoT devices, such as small sensors, communication modules, smart home products, etc. | Cortex-M |
In this series of articles, we will primarily explore the Cortex M core, without considering the Cortex A series and Cortex R series.
2. Cortex M Core
The Cortex-M processor family is based on the ARM M architecture definition, providing low-latency and highly deterministic operations for embedded systems, including the following Cortex-M series cores:
From the figure, we can roughly see:
① The Cortex-M0, Cortex-M0+, and Cortex-M1 series cores use the Armv6-M architecture, while the commonly used Cortex-M3, Cortex-M4, and Cortex-M7 series cores use the Armv7-M architecture. The Cortex-M23 series uses the Armv8-M Baseline architecture, and the Cortex-M33, Cortex-M33P, and Cortex-M55 series use the Armv8-M Mainline architecture.
② Starting from the Cortex-M23 series, the Cortex-M cores began to have TrustZone features.
③ Only the Cortex-M4, Cortex-M7, Cortex-M33, Cortex-M35P, and Cortex-M55 series have Digital Signal Processing extensions (DSP).
④ The Cortex-M33 and Cortex-M55 series began to have ARM custom instructions.
⑤ The Cortex-M33, Cortex-M35P, and Cortex-M55 series have coprocessor interfaces.
Next, we will understand each one.
1. Cortex M0
The Cortex M0[2] processor is one of the smallest processors from ARM, characterized by its small size, aiming to allow developers to achieve 32-bit performance at an 8-bit price point. The Cortex M0 processor uses the AHB-Lite bus, has a three-stage pipeline, and supports some Thumb/Thumb-2 instruction sets.
2. Cortex M0+
The Cortex M0+[3] processor further reduces power consumption and enhances performance based on the Cortex M0 processor. The Cortex M0+ processor uses the AMBA AHB-Lite bus, reduced to a two-stage pipeline, and supports some Thumb/Thumb-2 instruction sets.

3. Cortex M1
The Cortex M1[4] is the first processor designed specifically for implementation on FPGA. It uses the AMBA AHB-Lite bus, has a three-stage pipeline, and supports some Thumb/Thumb-2 instruction sets.

4. Cortex M3
The Cortex-M3[5] processor is designed for high-performance, low-cost platforms, including automotive body systems, industrial control systems, wireless networks, sensors, etc. It uses three AMBA AHB-Lite buses (Harvard bus architecture), has a three-stage pipeline, supports some Thumb/Thumb-2 instruction sets, and supports 8 to 256 priority levels.
5. Cortex M4
The Cortex-M4[6] processor is an efficient embedded processor that uses three AMBA AHB-Lite buses (Harvard bus architecture), has a three-stage pipeline, supports part of the Thumb/Thumb-2 instruction sets, and supports 8 to 256 priority levels. Compared to Cortex M3, it adds DSP extensions and an optional single-precision floating-point unit.
6. Cortex M7
The Cortex M7[7] processor is a high-performance, energy-efficient processor with a 6-stage superscalar pipeline, supports the Thumb/Thumb-2 instruction set, supports 8 to 256 priority levels, supports DSP extensions, an optional single-precision floating-point unit, uses one 64-bit AMBA4 AXI bus, one 32-bit AHB peripheral interface, and one 32-bit AMBA AHB slave interface for external host access to TCMs memory, and has instruction cache, data cache, instruction TCM, and data TCM.
7. Cortex M23
The Cortex M23[8] processor is a very simple processor, making it an ideal choice for most IoT and embedded applications that require security, with TrustZone.
The Cortex-M23 uses the Armv8-M baseline architecture, has a 2-stage pipeline, uses the AMBA 5 AHB bus, supports some Thumb/Thumb-2 instruction sets, and supports 4 priority levels. It also adds instruction support for hardware single-cycle multiplication (32×32) and fast division (32/32).

8. Cortex M33
The Cortex-M33[9] is suitable for embedded and IoT application scenarios that require effective security or digital signal control. The Cortex-M33 has many optional features, including DSP extensions, TrustZone security features for hardware-enforced isolation, a coprocessor interface, memory protection units, and floating-point computation units.
The Cortex-M33 uses the Armv8-M Mainline architecture, has a 3-stage pipeline, uses two AMBA5 AHB buses (Harvard architecture), supports Thumb/Thumb-2 instruction sets, supports 8 to 256 interrupt priority levels, optional TrustZone for Armv8-M support, DSP extensions support optional DSP/SIMD instructions, and optional coprocessor interface support.
Recommended article by a smart kid: In-depth analysis, simple and straightforward explanation of Cortex-M23/33 features[10].
9. Cortex M35P
The Cortex-M35P[11] processor uses TrustZone for Armv8-M, featuring hardware security and optional software isolation. For embedded developers seeking to prevent physical tampering and achieve higher levels of security certification, ARM offers the Cortex-M35P processor.
The Cortex-M35P uses the Armv8-M Mainline architecture, has a 3-stage pipeline, uses two AMBA5 AHB buses (Harvard architecture), supports Thumb/Thumb-2 instruction sets, supports 8 to 256 interrupt priority levels, has optional coprocessor interface support, TrustZone for Armv8-M support, DSP support, and features physical security with built-in protection against intrusive and non-intrusive attacks.

10. Cortex M55
The latest generation of Cortex M series processors is the Cortex M55.
The Cortex-M55[12] is the first processor based on the Armv8.1-M architecture, using ARM Helium technology (MVE, M series vector extension), bringing enhanced machine learning levels and signal processing performance to the next generation of small embedded devices, including wearable devices and smart voice devices.
The Cortex-M55 has a 4-stage pipeline, uses the AMBA 5 AXI5 64-bit host bus, optional 64-bit coprocessor interface support, optional TrustZone support, optional Helium technology support, and DSP extensions support 32-bit DSP/SIMD instruction extensions.

Conclusion
After getting to know the members of the Cortex-M family, we can see that ARM Cortex-M is increasingly focusing on security and AI performance. This also gives us a direction for development. The future of IoT development is no longer just about connecting to cloud platforms to report data, but more about enhancing the security capabilities and AI capabilities of IoT devices. After all, hardware devices facing intrusion are more terrifying than computers getting infected. If the edge AI processing capability is greatly enhanced, data processing can be completed directly at the terminal without consuming unnecessary cloud computing power~
Thus, the first stop of the ARM exploration journey ends here! See you next time!
Note: All images in this article are sourced from ARM.
References
ARM CPU Architecture: https://developer.arm.com/architectures/cpu-architecture