Author: Joseph Yiu, Senior Embedded Technology Manager, ARM
William Gao, ARM China Application Engineer and
Gabriel Wang, ARM China Embedded Application Engineer also contributed to the Chinese version of this article
Summary
The ARM Cortex-M processor family now has 8 processor members. In this article, we will compare the product features among the Cortex-M series processors, focusing on how to choose the right Cortex-M processor based on product applications. This article will detail the instruction set and advanced interrupt handling capabilities of the Cortex-M series processors, as well as comparisons of SoC system-level features, debugging, tracing functionalities, and performance.
1 Introduction
Today, the ARM Cortex-M processor family has 8 processor members. In addition, ARM’s product line includes many other processor members. For many beginners, and even some chip designers who are experienced but unfamiliar with the ARM series processors, it is easy to confuse these products. Different ARM processors have different instruction sets, system functions, and performance. This article will deeply showcase the key differences between Cortex-M series processors and their differences from other ARM series processors.
1.1 ARM Processor Family
Over the years, ARM has developed a considerable number of different processor products. As shown in the figure below (Figure 1): ARM processor products are divided into the classic ARM processor series and the latest Cortex processor series. Depending on the application scope, ARM processors can be categorized into three series.
Application Processors – High-end processors aimed at mobile computing, smartphones, servers, and other markets. These processors operate at very high clock frequencies (over 1GHz) and support memory management units (MMUs) required by complete operating systems such as Linux, Android, MS Windows, and mobile operating systems. If the planned product needs to run one of the above operating systems, you need to choose an ARM application processor.
Real-time Processors – High-performance processor series aimed at real-time applications, such as disk controllers, automotive transmission systems, and wireless communication baseband control. Most real-time processors do not support MMUs, but usually have MPUs, Cache, and other memory features designed for industrial applications. Real-time processors operate at relatively high clock frequencies (e.g., 200MHz to >1GHz), with very low response latency. Although real-time processors cannot run full versions of Linux and Windows operating systems, they support a large number of real-time operating systems (RTOS).
Microcontroller Processors – Microcontroller processors are usually designed to be very small in size and have a high energy efficiency ratio. Typically, these processors have short pipelines and operate at very low maximum clock frequencies (although there are processors on the market that can run above 200MHz). Moreover, the new Cortex-M processor family is designed to be very easy to use. Therefore, ARM microcontroller processors are very successful and popular in the microcontroller and deeply embedded system market.
Figure 1: Processor Family
Table 1 summarizes the main features of the three processor series.
Table 1: Summary of Processor Features
1.2 Cortex-M Processor Family
The Cortex-M processor family focuses more on the low-performance end, but these processors still have robust performance compared to many traditional processors used in microcontrollers. For example, the Cortex-M4 and Cortex-M7 processors are used in many high-performance microcontroller products, with maximum clock frequencies reaching 400MHz.
Of course, performance is not the only criterion for selecting a processor. In many applications, low power consumption and cost are key selection criteria. Therefore, the Cortex-M processor family includes a variety of products to meet different needs:
Table 2: Cortex-M Processor Family
Unlike the older classic ARM processors (e.g., ARM7TDMI, ARM9), Cortex-M processors have a very different architecture. For example:
– Only supports ARM Thumb® instructions, extended to support both 16-bit and 32-bit instructions in the Thumb-2 version
– Built-in Nested Vector Interrupt Controller (NVIC) responsible for interrupt handling, automatically managing interrupt priorities, interrupt masking, interrupt nesting, and system exception handling.
– Interrupt handling functions can be programmed using standard C language, and the nested interrupt handling mechanism avoids the need for software to determine which interrupt needs to be processed. Meanwhile, the interrupt response speed is deterministic and low-latency.
– The vector table is transformed from a jump instruction to the starting address of interrupt and system exception handling functions.
– The register set and some programming modes have also been changed.
These changes mean that much of the assembly code written for classic ARM processors needs to be modified, and old projects need to be changed and recompiled to migrate to Cortex-M products. Specific details of software porting are documented in ARM documentation:
ARM Cortex-M3 Processor Software Development for ARM7TDMI Processor Programmers
http://www.arm.com/files/pdf/Cortex-M3_programming_for_ARM7_developers.pdf
1.3 Common Features of Cortex-M Series Processors
There are many similarities between Cortex-M0, M0+, M3, M4, and M7, such as:
– Basic programming model (Section 3.1)
– Interrupt response management of the Nested Vector Interrupt Controller (NVIC)
– Sleep modes designed in the architecture: Sleep mode and Deep Sleep mode (Section 4.1)
– Operating system support features (Section 3.3)
– Debugging features (Section 6)
– Ease of use
For example, the Nested Vector Interrupt Controller is a built-in interrupt controller
Figure 2: Nested Vector Interrupt Controller of Cortex-M Processors
Supports interrupt inputs from many peripheral devices, a non-maskable interrupt request, an interrupt request from the built-in clock (SysTick) (see Section 3.3), and a certain number of system exception requests. NVIC handles the priority and masking management of these interrupts and exceptions.
More details about NVIC and the exception handling model are described in Section 3.2. Similarities and differences among other Cortex-M processors will be explained in the rest of this article.
2 Cortex-M Processor Instruction Set
2.1 Introduction to Instruction Set
In most cases, application code can be written in C or other high-level languages. However, a basic understanding of the instruction set supported by Cortex-M processors helps developers choose the appropriate Cortex-M processor for specific applications. The instruction set (ISA) is part of the processor architecture, and Cortex-M processors can be divided into several architectural specifications
Table 3: ARM Architecture Specifications of Cortex-M Processors
All Cortex-M processors support the Thumb instruction set. The entire Thumb instruction set becomes quite large when extended to the Thumb-2 version. However, different Cortex-M processors support different subsets of the Thumb instruction set, as shown in Figure 3
Figure 3: Instruction Set of Cortex-M Processors
2.2 Cortex-M0/M0+/M1 Instruction Set
The Cortex-M0/M0+/M1 processors are based on the ARMv6-M architecture. This is a small instruction set that only supports 56 instructions, most of which are 16-bit instructions, as shown in Figure 3, occupying only a small part. However, the registers and data lengths processed in these processors are 32-bit. For most simple I/O control tasks and general data processing, these instructions are sufficient. Such a small instruction set can be implemented with very few logic gates, with the minimum configuration of Cortex-M0 and Cortex-M0+ requiring only 12K gates. However, many of the instructions cannot use the high registers (R8 to R12), and the ability to generate immediate values is limited. This is a result of balancing ultra-low power consumption and performance needs.
2.3 Cortex-M3 Instruction Set
The Cortex-M3 processor is based on the ARMv7-M architecture and supports a richer instruction set, including many 32-bit instructions that can efficiently use high registers. Additionally, M3 also supports:
· Table jump instructions and conditional execution (using IT instructions)
· Hardware division instructions
· Multiply-accumulate instructions (MAC)
· Various bit manipulation instructions
A richer instruction set enhances performance through several means; for example, 32-bit Thumb instructions support a larger range of immediate values, jump offsets, and memory data range address offsets. It supports basic DSP operations (e.g., supports several MAC instructions that require multiple clock cycles to execute, as well as saturation operation instructions). Finally, these 32-bit instructions allow for barrel shift operations on multiple data with a single instruction.
Support for a richer instruction set leads to larger area costs and higher power consumption. Typical microcontrollers have a gate count for Cortex-M3 that is more than twice that of Cortex-M0 and Cortex-M0+. However, the area of the processor is only a small part of most modern microcontrollers, and the extra area and power consumption are often not as significant.
2.4 Cortex-M4 Instruction Set
The Cortex-M4 is similar to the Cortex-M3 in many aspects: pipeline, programming model. The Cortex-M4 supports all the features of Cortex-M3 and additionally supports various instructions aimed at DSP applications, such as SIMD, saturation operation instructions, a series of single-cycle MAC instructions (Cortex-M3 only supports limited MAC instructions and executes them over multiple cycles), and optional single-precision floating-point operation instructions.
The SIMD operations of Cortex-M4 can process two 16-bit data and four 8-bit data in parallel. For example, the QADD8 and QADD16 operations shown in Figure 4:
Figure 4: Example of SIMD Instructions: QADD8 and QADD16
The uses of SIMD enable much faster computation of 16-bit and 8-bit data in certain DSP operations as the calculation can be parallelized. However, in general programming, C compilers are unlikely to utilize the SIMD capability. That is why the typical benchmark results of the Cortex-M3 and Cortex-M4 are similar. However, the internal data path of the Cortex-M4 is different from the Cortex-M3, which enables faster operations in a few cases (e.g., single-cycle MAC, and allows write back of two registers in a single cycle).
2.5 Cortex-M7 Instruction Set
The instruction set supported by Cortex-M7 is similar to that of Cortex-M4, with the addition of:
· Floating-point data architecture based on FPv5, rather than FPv4 of Cortex-M4, so Cortex-M7 supports additional floating-point instructions
· Optional double-precision floating-point data processing instructions
· Support for cache data prefetch instructions (PLD)
The pipeline of Cortex-M7 is very different from that of Cortex-M4. Cortex-M7 has a 6-stage dual-issue pipeline that can achieve higher performance. Most software designed for Cortex-M4 can run directly on Cortex-M7. However, to fully utilize the pipeline differences for optimal performance, software needs to be recompiled, and in many cases, small upgrades to the software are needed to fully leverage new features like Cache.
2.6 Cortex-M23 Instruction Set
The instruction set of Cortex-M23 is based on the ARMv8-M Baseline sub-specification, which is a superset of ARMv6-M. The extended instructions include:
· Hardware division instructions
· Comparison and jump instructions, 32-bit jump instructions
· Instructions supporting TrustZone security extensions
· Mutex data access instructions (commonly used for semaphore operations)
· 16-bit immediate generation instructions
· Load acquire and store release instructions (supporting C11)
In certain cases, these enhanced instruction sets can improve processor performance and are useful for SoC designs containing multiple processors (e.g., mutex access helps with semaphore handling in multiprocessor systems).
2.7 Cortex-M33 Instruction Set
Because the design of Cortex-M33 is highly configurable, some instructions are optional. For example:
· DSP instructions (supported by Cortex-M4 and Cortex-M7) are optional
· Single-precision floating-point operation instructions are optional, based on FPv5, and include several more instructions than Cortex-M4.
Cortex-M33 also supports new instructions introduced by ARMv8-M Mainline:
· Instructions supporting TrustZone security extensions
· Load acquire and store release instructions (supporting C11)
2.8 Summary of Instruction Set Features Comparison
ARMv6-M, ARMv7-M, and ARMv8-M architectures have many features of instruction set characteristics, making it difficult to cover all details. However, the table below (Table 4) summarizes those key differences.
Table 4: Summary of Instruction Set Features
The most important characteristic of the Cortex-M processor instruction set is upward compatibility. The instructions of Cortex-M3 are a superset of those of Cortex-M0/M0+/M1. Therefore, theoretically, if the memory allocation is consistent, binary files running on Cortex-M0/M0+/M1 can run directly on Cortex-M3. The same principle applies to Cortex-M4/M7 and other Cortex-M processors; instructions supported by Cortex-M0/M0+/M1/M3 can also run on Cortex-M4/M7.
Although Cortex-M0/M0+/M1/M3/M23 processors do not have a floating-point unit configuration option, the processors can still perform floating-point data operations using software. This also applies to products based on Cortex-M4/M7/M33 but without a configured floating-point unit. In these cases, when floating-point numbers are used in the program, the compiler toolchain will insert the necessary runtime software library during the linking phase. Software-mode floating-point operations require longer execution time and slightly increase code size. However, if floating-point operations are not frequently used, this solution is suitable for such applications.
3 Architectural Features
3.1 Programming Model
The programming model of the Cortex-M processor family is highly consistent. For example, all Cortex-M processors support R0 to R15, PSR, CONTROL, and PRIMASK. Two special registers—FAULTMASK and BASEPRI—are only supported by Cortex-M3, Cortex-M4, Cortex-M7, and Cortex-M33; the floating-point register group and FPSCR (Floating Point Status and Control Register) are used by the optional floating-point unit of Cortex-M4/M7/M33.
Figure 5: Programming Model
The BASEPRI register allows programs to block interrupts and exceptions of specified or lower priority levels. This is important for ARMv7-M, as Cortex-M3, Cortex-M4, Cortex-M7, and Cortex-M33 have a large number of priority levels, while ARMv6-M and ARMv8-M Baseline only have a limited 4 priority levels. FAULTMASK is usually used in complex error handling (see Section 3.4)
Non-privileged implementations are optional for ARMv6-M processors, while they have always been supported for ARMv7-M and ARMv8-M processors. For Cortex-M0+ processors, it is optional, and Cortex-M0 and Cortex-M1 do not support this feature. This means that the CONTROL register is slightly different across various Cortex-M processors. The configuration of the FPU also affects the CONTROL register, as shown in Figure 6.
Figure 6: CONTROL Register
Another difference in the programming model is the details of the PSR register (Program Status Register). For all Cortex-M processors, the PSR register is divided into Application Program Status Register (APSR), Execution Program Status Register (EPSR), and Interrupt Program Status Register (IPSR). ARMv6-M and ARMv8-M Baseline series processors do not support the Q bit of APSR and the ICI/IT bits of EPSR. ARMv7E-M series (Cortex-M4, Cortex-M7) and ARMv8-M Mainline (Cortex-M33 configured with DSP extensions) support the GE bit. Additionally, the range of interrupt number digits of IPSR in ARMv6-M series processors is small, as shown in Figure 7.
Figure 7: PSR Differences
Note that the programming model of Cortex-M is different from those classic ARM processors like ARM7TDMI. Besides the different register sets, the definitions of “mode” and “state” in classic ARM processors are also different from those in Cortex-M. Cortex-M only has two modes: Thread mode and Handler mode, and Cortex-M processors always operate in Thumb state (ARM instructions are not supported)
3.2 Exception Handling Model and Nested Vector Interrupt Controller (NVIC)
All Cortex-M processors include the NVIC module and adopt the same exception handling model. If an interrupt occurs with a higher priority than the currently running level and is not masked by any interrupt mask registers, the processor will respond to this interrupt/exception and push certain registers onto the current stack. Under this stack mechanism, interrupt handlers can be written as ordinary C functions, allowing many small interrupt handler functions to respond immediately without additional stack processing overhead.
Some interrupts and system exceptions used by ARMv7-M/ARMv8-M Mainline series processors are not supported by ARMv6-M/ARMv8-M Baseline products, as shown in Figure 8. For example, the number of interrupts for Cortex-M0, M0+, and M1 is limited to less than 32, with no debugging monitor exceptions, and error exceptions are limited to HardFault (see Section 3.4 for error handling details). In contrast, Cortex-M23, Cortex-M3, Cortex-M4, and Cortex-M7 processors can support up to 240 peripheral device interrupts. Cortex-M33 supports up to 480 interrupts.
Another distinction is the number of available priority levels:
ARMv6-M Architecture – ARMv6-M supports 2 fixed levels (NMI and HardFault) and 4 programmable levels (represented by two bits of each priority level register). This is sufficient for most microcontroller applications.
ARMv7-M Architecture – ARMv7-M series processors have a range of programmable priority levels that can be configured from 8 levels (3 bits) to 256 levels (8 bits) depending on area constraints. ARMv7-M processors also have a feature called interrupt priority grouping, which allows the interrupt priority registers to be further divided into group priority and sub-priority, thus allowing detailed specification of preemptive priority behavior.
ARMv8-M Baseline – Similar to ARMv6-M, M23 also has a 2-bit priority level register. With the optional TrustZone security extension component, secure software can convert the priority levels of interrupts in the non-secure environment to the lower half of the priority range, ensuring that certain interrupts/exceptions in the secure environment are always higher than those in the non-secure environment.
ARMv8-M Mainline – Similar to ARMv7-M. It can support 8 to 256 interrupt priority levels and interrupt priority grouping. It also supports the priority adjustment feature of ARMv8-M Baseline.
Figure 8: Types of Exceptions and Interrupts in Cortex-M Processors
All Cortex-M processors rely on the vector table for exception handling. The vector table holds the starting addresses of exception handling functions (as shown in Figure 8). The starting address of the vector table is determined by a register called the Vector Table Offset Register (VTOR).
· Cortex-M0+, Cortex-M3, and Cortex-M4 processors: by default, the vector table is located at the beginning of the memory map (address 0x0). Cortex-M0+, Cortex-M3, and Cortex-M4: The vector table is by default placed at the starting address of the storage space (address 0x0).
· In Cortex-M7, Cortex-M23, and Cortex-M33 processors: the default value for VTOR is defined by chip designers. Cortex-M23 and Cortex-M33 processors can have two separate vector tables for Secure and Non-secure exceptions/interrupts. Cortex-M7, Cortex-M23, and Cortex-M33: The initial value of VTOR is defined by the chip designers. Cortex-M23 and Cortex-M33 processors have two independent vector tables for secure and non-secure exceptions/interrupts.
· Cortex-M0 and Cortex-M1 do not implement programmable VTOR and the starting address of the vector table is always 0x00000000. Cortex-M0 and Cortex-M1 do not implement programmable VTOR, and the starting address of the vector table is always 0x00000000.
The VTOR of Cortex-M0+ and Cortex-M23 processors is optional. If VTOR is implemented, the starting address of the vector table can be changed by setting VTOR, which is useful for the following situations:
· Relocating the vector table to SRAM to dynamically change the entry point of exception handling functions
· Relocating the vector table to SRAM to achieve faster vector reading (if flash memory is slow)
· Relocating the vector table to different positions in ROM (or Flash), allowing different exception handlers for different stages of program execution
The NVIC programming model also has additional differences among different Cortex-M processors. The differences are summarized in Table 5:
Table 5: Differences in NVIC Programming Model and Features
In most cases, operations on the interrupt control features of NVIC are handled through APIs provided by CMSIS-CORE, which are included in the device driver libraries provided by microcontroller vendors. For Cortex-M3/M4/M7/M23/M33 processors, even if interrupts are enabled, their priorities can be changed. ARMv6-M processors do not support dynamic priority adjustments, and when you need to change the interrupt priority, you need to temporarily turn off the interrupt.
3.3 Operating System Support Features
The Cortex-M processor architecture was designed with operating system support in mind. The features supporting operating systems include:
· Shadow stack pointer
· System service call (SVC) and suspendable system call (PenSV) exceptions
· SysTick – 24-bit decrementing timer, generating periodic exception interrupts for operating system timing and task management
· Non-privileged execution and memory protection unit (MPU) supported by Cortex-M0+/M3/M4/M7/M23/M33
The system service call (SVC) exception is triggered by the SVC instruction, allowing application tasks running in non-privileged states to initiate privileged operating system services. The suspendable system call exception is very helpful for scheduling non-critical operations like context switching in the operating system.
To fit Cortex-M1 into very small FPGA devices, all features to support operating systems are optional for Cortex-M1. For Cortex-M0, Cortex-M0+, and Cortex-M23 processors, the system clock SysTick is optional.
Generally, all Cortex-M processors support operating systems. Applications running on Cortex-M0+, Cortex-M3, Cortex-M4, Cortex-M7, Cortex-M23, and Cortex-M33 can run in non-privileged execution states and can simultaneously utilize the optional memory management unit (MPU) to avoid illegal memory access. This enhances the robustness of the system.
3.4 TrustZone Security Extensions
In recent years, the Internet of Things (IoT) has become a hot topic among embedded system developers. IoT system products are becoming more complex, and the pressure for time-to-market is increasing. Embedded system products need better solutions to ensure system security while also being convenient for software developers. The traditional solution is to divide software into privileged and non-privileged parts, where privileged software uses the MPU to prevent non-privileged applications from accessing critical system resources that contain security-sensitive information. This solution is suitable for some IoT systems, but in some cases, a two-layer division is not sufficient. Especially for systems that contain many complex privileged-level software components, a flaw in the privileged code can lead to hackers gaining complete control of the system.
The ARMv8-M architecture includes a security extension called TrustZone, which introduces an orthogonal division of secure and non-secure states.
· General applications are in non-secure state
· Software components and security-related resources (e.g., secure storage, cryptographic accelerators, true random number generators (TRNG)) are in secure state.
Figure 9: Isolation of Secure and Non-secure States
Non-secure state software can only access non-secure state storage and peripherals, while secure software can access all resources in both states.
With this solution, software developers can develop applications in the non-secure environment in the traditional way. At the same time, they can execute secure IoT connections using secure communication software libraries provided by chip manufacturers. Even if the privileged programs running in the non-secure environment have vulnerabilities, the TrustZone security mechanism can prevent hackers from taking control of the entire device, limiting the impact of attacks and allowing for remote recovery of the system. Additionally, the ARMv8-M architecture also introduces stack boundary checking and enhanced MPU design, promoting the adoption of additional security measures.
The security architecture definition also extends to the system level, where each interrupt can be set to have secure or non-secure attributes. Interrupt exception handlers will also automatically save and restore register data in the secure environment to prevent security information leakage. Thus, the TrustZone security extension enables the system to support the needs of real-time systems, providing a solid security foundation for IoT applications and making it easier for software development to create applications based on this technology.
The TrustZone module is optional for Cortex-M23 and Cortex-M33 processors. For more information on ARMv8-M TrustZone, please refer to The Next Steps in the Evolution of Embedded Processors for the Smart Connected Era. For more TrustZone resources, please visit the “TrustZone for ARMv8-M Community” on community.arm.com.
3.5 Error Handling
A distinguishing feature of ARM processors and microcontrollers of other architectures is the error handling capability. When an error is detected, an error exception handler is triggered to execute appropriate handling. The situations that trigger errors may include:
· Undefined instructions (e.g., Flash memory corruption)
· Accessing illegal address space (e.g., stack pointer crash) or MPU illegal access
· Illegal operations (e.g., when the processor tries to trigger SVC exceptions while already in a higher priority interrupt)
The error handling mechanism allows embedded systems to respond more quickly to various issues. Otherwise, if the system crashes, the watchdog timer may take a long time to restart the system.
In the ARMv6-M architecture, all error events trigger the HardFault handler, which has a priority of -1 (higher than all programmable exceptions but lower than non-maskable interrupts (NMI)). All error events are considered unrecoverable, and typically, we only run error reporting in the HardFault handler and then trigger an automatic reset.
In the ARMv8-M Baseline architecture, similar to ARMv6-M, there is only one error exception (HardFault). However, the priority of HardFault in ARMv8-M Baseline can be -1 or -3 when implementing the TrustZone security extension.
ARMv7-M and ARMv8-M Mainline products have several configurable error exceptions in addition to HardFault:
· Memmanage (memory management error)
· Bus error (response error from the bus)
· Usage error (undefined instruction or other illegal operations)
· SecureFault (only supported by ARMv8-M Mainline products, handling illegal operations in the TrustZone security extension)
The priorities of these exceptions can be programmed to change, and they can be turned on and off individually. If needed, they can also use the FAULTMASK register to raise their priority to the same level as HardFault. ARMv7-M and ARMv8-M Mainline products also have several error status registers that can provide clues about the triggering error exception event and the error address register to determine the access address that triggered this error exception, making debugging easier.
Additional error handlers in ARMv7-M and ARMv8-M Mainline products provide flexible error handling capabilities, and error status registers make locating and debugging error events easier. Many commercial development kits already embed the functionality to diagnose error events using error status registers. Additionally, error handlers can perform some repair work at runtime.
Table 6: Summary of Error Handling Features Comparison
The second part will be continued next time:
4 System Features
5 Performance Considerations
6 Debugging and Tracing Features
7 Product Development Based on Cortex-M Processors
8 Conclusion
More exciting content for you to choose from
#Featured Articles#
Details│Teach you how to use pressure sensors to measure pressure
Vendor Perspective│big.LITTLE in the world of DynamIQ
China’s Intelligent Automotive Industry Chain Map
A method to improve the running quality of stepper motors through current control
Understanding 40 commonly used chip packaging technologies in one article
Recent Searches│Common Terms in Automotive Electronics Technology in Chinese and English
How wireless power transmission can achieve a more convenient daily life
⊙May 16 Xiamen│Tektronix 2017 Asia Pacific Technology Lecture
⊙May 18 Tianjin│Tektronix 2017 Asia Pacific Technology Lecture⊙May 26 Shenzhen│Vicor Power Technology Seminar Invitation ⊙June 9 Chongqing│Automotive Electronics Technology Seminar ⊙June 16 Shanghai│Automotive Electronics Technology Seminar
Long press the QR code to learn about recent offline activities and quickly register
Thank you for taking the time · to read this article
For more exciting information, please click 【Read the original text】
Leave a Comment
Your email address will not be published. Required fields are marked *