Specification of Cortex-M Processor ARM Architecture
All Cortex-M processors support the Thumb instruction set. The entire Thumb instruction set becomes quite large when extended to the Thumb-2 version. However, different Cortex-M processors support different subsets of the Thumb instruction set, as shown in the figure below (click to view the full image).
Instruction Set of Cortex-M Processors
Cortex-M0/M0+/M1 Instruction Set
The Cortex-M0/M0+/M1 processors are based on the ARMv6-M architecture. This is a small instruction set that supports only 56 instructions, most of which are 16-bit instructions, as shown in Figure 3, which occupy a very small portion. However, the registers and the data length processed in these processors are 32 bits. For most simple I/O control tasks and general data processing, these instructions are sufficient. Such a small instruction set can be implemented in processor design with very few gates; the minimum configuration of Cortex-M0 and Cortex-M0+ requires only 12K gates. However, many of the instructions cannot use the high registers (R8 to R12), and the ability to generate immediate values is limited. This is a result of balancing ultra-low power consumption and performance requirements.
Cortex-M3 Instruction Set
The Cortex-M3 processor is based on the ARMv7-M architecture and supports a richer instruction set, including many 32-bit instructions that can efficiently use high registers. Additionally, M3 also supports:
-
Table lookup jump instructions and conditional execution (using IT instructions)
-
Hardware division instructions
-
Multiply-accumulate instructions (MAC)
-
Various bit manipulation instructions
A richer instruction set enhances performance through several means: for example, 32-bit Thumb instructions support a larger range of immediate values, jump offsets, and memory data range address offsets. It supports basic DSP operations (for example, several MAC instructions that require multiple clock cycles to execute, as well as saturation operation instructions). Finally, these 32-bit instructions allow for barrel shift operations on multiple data with a single instruction.
The support for a richer instruction set leads to larger area costs and higher power consumption. A typical microcontroller, the number of gates in Cortex-M3 is more than double that of Cortex-M0 and Cortex-M0+. However, the area of the processor is just a small part of most modern microcontrollers, and the extra area and power consumption are often not that significant.
Cortex-M4 Instruction Set
Cortex-M4 is similar to Cortex-M3 in many ways: pipelining, programming model. Cortex-M4 supports all the functions of Cortex-M3 and additionally supports various DSP-oriented instructions, such as SIMD, saturation operation instructions, a series of single-cycle MAC instructions (Cortex-M3 only supports a limited number of MAC instructions and they are executed over multiple cycles), and optional single-precision floating-point operation instructions.
The SIMD operations of Cortex-M4 can process two 16-bit data and four 8-bit data in parallel. For example, the following figure shows the QADD8 and QADD16 operations:
Examples of SIMD Instructions: QADD8 and QADD16
In some DSP operations, using SIMD can accelerate the computation of 16-bit and 8-bit data faster since these operations can be processed in parallel. However, in general programming, C compilers cannot fully utilize SIMD operation capabilities. This is why the typical benchmark scores of Cortex-M3 and Cortex-M4 are roughly similar. However, the internal data paths of Cortex-M4 are different from those of Cortex-M3, and in some cases, Cortex-M4 can process faster (for example, single-cycle MAC can write back to two registers in one cycle).
Cortex-M7 Instruction Set
The instruction set supported by Cortex-M7 is similar to that of Cortex-M4, adding:
-
The floating-point data architecture is based on FPv5, rather than the FPv4 of Cortex-M4, so Cortex-M7 supports additional floating-point instructions
-
Optional double-precision floating-point processing instructions
-
Support for cache data prefetch instructions (PLD)
The pipelining of Cortex-M7 is very different from that of Cortex-M4. Cortex-M7 has a 6-stage dual-issue pipeline, which can achieve higher performance. Most software designed for Cortex-M4 can run directly on Cortex-M7. However, to fully utilize the pipeline differences for optimal performance, software needs to be recompiled, and in many cases, software needs some minor upgrades to fully leverage new features like Cache.
Cortex-M23 Instruction Set
The instruction set of Cortex-M23 is based on the ARMv8-M Baseline sub-specification, which is a superset of ARMv6-M. The extended instructions include:
-
Hardware division instructions
-
Comparison and jump instructions, 32-bit jump instructions
-
Instructions supporting TrustZone security extensions
-
Mutex data access instructions (commonly used for semaphore operations)
-
16-bit immediate value generation instructions
-
Load acquire and store release instructions (supporting C11)
In some cases, these enhanced instruction sets can improve processor performance and are useful for SoC designs that contain multiple processors (for example, mutex access is helpful for semaphore handling in multiprocessor scenarios).
Cortex-M33 Instruction Set
Because the design of Cortex-M33 is highly configurable, some instructions are optional. For example:
-
DSP instructions (supported by Cortex-M4 and Cortex-M7) are optional.
-
Single-precision floating-point operation instructions are optional; these instructions are based on FPv5 and have a few more than Cortex-M4.
Cortex-M33 also supports new instructions introduced by ARMv8-M Mainline:
-
Instructions supporting TrustZone security extensions
-
Load acquire and store release instructions (supporting C11)
Summary of Instruction Set Features Comparison
ARMv6-M, ARMv7-M, and ARMv8-M architectures have many instruction set feature characteristics, making it difficult to cover all the details. However, the table below summarizes the key differences.
Summary of Instruction Set Features
The most important feature of the Cortex-M processor instruction set is upward compatibility. The instructions of Cortex-M3 are a superset of Cortex-M0/M0+/M1. Therefore, theoretically, if the memory allocation is consistent, binaries running on Cortex-M0/M0+/M1 can run directly on Cortex-M3. The same principle applies to Cortex-M4/M7 and other Cortex-M processors; instructions supported by Cortex-M0/M0+/M1/M3 can also run on Cortex-M4/M7.
Although Cortex-M0/M0+/M1/M3/M23 processors do not have floating-point unit configuration options, the processors can still utilize software to perform floating-point data operations. This also applies to Cortex-M4/M7/M33-based products that do not have floating-point unit configurations. In such cases, when floating-point numbers are used in the program, the compiler toolchain will insert the necessary runtime software library during the linking stage. Software-mode floating-point operations take longer to execute and slightly increase code size. However, if floating-point operations are not frequently used, this solution is suitable for such applications.
1. “Microcontroller and Embedded System Applications” electronic magazine will be launched soon!
2.If you are still confused, you will know your future is limitless after reading this!
3.How do new programmers find bugs in complex code?
4.[Waiting for you] There are many embedded positions in the capital!
5. Engineers should be familiar with the performance comparison between GPU and CPU
6.Regarding Intel’s transformation of not holding IDF, this is my view…
Disclaimer: This article is a network reprint, and the copyright belongs to the original author. If there are any copyright issues, please contact us, and we will confirm the copyright based on the copyright certificate you provide and pay remuneration or delete the content.
Leave a Comment
Your email address will not be published. Required fields are marked *