2.1 Introduction to Instruction Set
Instruction Set Overview
In most cases, application code can be written in C or other high-level languages. However, a basic understanding of the instruction set supported by the Cortex-M processors helps developers choose the appropriate Cortex-M processor for specific applications. The Instruction Set Architecture (ISA) is a part of the processor architecture, and Cortex-M processors can be divided into several architectural specifications.
Table 3: Specifications of Cortex-M Processor ARM Architecture
All Cortex-M processors support the Thumb instruction set. The entire Thumb instruction set becomes quite large when extended to the Thumb-2 version. However, different Cortex-M processors support different subsets of the Thumb instruction set, as shown in Figure 3:
Figure 3: Instruction Set of Cortex-M Processors
2.2 Cortex-M0/M0+/M1 Instruction Set
The Cortex-M0/M0+/M1 processors are based on the ARMv6-M architecture. This is a small instruction set that only supports 56 instructions, most of which are 16-bit instructions, as shown in Figure 3, occupying only a small part. However, the registers and the length of data processed in such processors are 32 bits. For most simple I/O control tasks and general data processing, these instructions are sufficient. This small instruction set can be implemented in processor design with very few logic gates, with the minimum configuration of Cortex-M0 and Cortex-M0+ being only 12K gates. However, many of the instructions cannot use high registers (R8 to R12), and the ability to generate immediate values is limited. This is a result of balancing ultra-low power consumption and performance requirements.
2.3 Cortex-M3 Instruction Set
The Cortex-M3 processor is based on the ARMv7-M architecture and supports a richer instruction set, including many 32-bit instructions that can efficiently use high registers. Additionally, M3 also supports:
· Table jump instructions and conditional execution (using IT instructions)
· Hardware division instructions
· Multiply-accumulate instructions (MAC)
· Various bit manipulation instructions
The richer instruction set enhances performance in several ways; for example, the 32-bit Thumb instructions support a larger range of immediate values, jump offsets, and address offsets for memory data. It supports basic DSP operations (for example, several MAC instructions that require multiple clock cycles to execute, and saturation operation instructions). Finally, these 32-bit instructions allow for bucket shift operations on multiple data with a single instruction.
The support for a richer instruction set leads to greater area costs and higher power consumption. A typical microcontroller, the Cortex-M3, has more than twice the number of logic gates compared to the Cortex-M0 and Cortex-M0+. However, the area of the processor is only a small part of most modern microcontrollers, and the extra area and power consumption are often less significant.
2.4 Cortex-M4 Instruction Set
The Cortex-M4 is similar to the Cortex-M3 in many aspects: pipelining, programming model. The Cortex-M4 supports all the features of the Cortex-M3 and additionally supports various DSP-oriented instructions, such as SIMD, saturation operation instructions, a series of single-cycle MAC instructions (Cortex-M3 only supports a limited number of MAC instructions and they are multi-cycle), and optional single-precision floating-point operation instructions.
The SIMD operations of Cortex-M4 can process two 16-bit data and four 8-bit data in parallel. For example, the QADD8 and QADD16 operations shown in Figure 4:
Figure 4: Example of SIMD Instructions: QADD8 and QADD16
The uses of SIMD enable much faster computation of 16-bit and 8-bit data in certain DSP operations as the calculation can be parallelized. However, in general programming, C compilers are unlikely to utilize the SIMD capability. That is why the typical benchmark results of the Cortex-M3 and Cortex-M4 are similar. However, the internal data path of the Cortex-M4 is different from Cortex-M3, which enables faster operations in a few cases (e.g., single-cycle MAC, and allows write-back of two registers in a single cycle).
2.5 Cortex-M7 Instruction Set
The instruction set supported by Cortex-M7 is similar to that of Cortex-M4, adding:
· Floating-point data architecture based on FPv5, rather than FPv4 of Cortex-M4, so Cortex-M7 supports additional floating-point instructions
· Optional double-precision floating-point data processing instructions
· Support for cache data prefetch instructions (PLD)
The pipeline of Cortex-M7 is very different from that of Cortex-M4. Cortex-M7 is a 6-stage dual-issue pipeline that can achieve higher performance. Most software designed for Cortex-M4 can run directly on Cortex-M7. However, to fully utilize the pipeline differences for the best optimization, the software needs to be recompiled, and in many cases, the software needs some minor upgrades to fully utilize new features like Cache.
2.6 Cortex-M23 Instruction Set
The instruction set of Cortex-M23 is based on the ARMv8-M Baseline sub-specification, which is a superset of ARMv6-M. The extended instructions include:
· Hardware division instructions
· Comparison and jump instructions, 32-bit jump instructions
· Support for TrustZone security extension instructions
· Mutex data access instructions (commonly used for semaphore operations)
· 16-bit immediate value generation instructions
· Load acquire and store release instructions (supporting C11)
In some cases, these enhanced instruction sets can improve processor performance and are useful for SoC designs containing multiple processors (for example, mutex access is helpful for semaphore handling in multiprocessors).
2.7 Cortex-M33 Instruction Set
Because the design of Cortex-M33 is highly configurable, certain instructions are also optional. For example:
· DSP instructions (supported by Cortex-M4 and Cortex-M7) are optional
· Single-precision floating-point operation instructions are optional, which are based on FPv5 and have a few more instructions than Cortex-M4.
Cortex-M33 also supports new instructions introduced by ARMv8-M Mainline:
· Support for TrustZone security extension instructions
· Load acquire and store release instructions (supporting C11)
2.8 Summary of Instruction Set Features Comparison
ARMv6-M, ARMv7-M, and ARMv8-M architectures have many instruction set features, making it difficult to introduce all the details. However, the table below (Table 4) summarizes those key differences.
Table 4: Summary of Instruction Set Features
The most important feature of the Cortex-M processor instruction set is upward compatibility. The instructions of Cortex-M3 are a superset of those of Cortex-M0/M0+/M1. Therefore, theoretically, if the memory allocation is consistent, binaries running on Cortex-M0/M0+/M1 can run directly on Cortex-M3. The same principle applies to Cortex-M4/M7 and other Cortex-M processors; instructions supported by Cortex-M0/M0+/M1/M3 can also run on Cortex-M4/M7.
Although Cortex-M0/M0+/M1/M3/M23 processors do not have floating-point unit configuration options, the processors can still utilize software for floating-point data operations. This also applies to products based on Cortex-M4/M7/M33 but without a configured floating-point unit. In this case, when floating-point numbers are used in the program, the compiler toolkit will insert the necessary runtime software library during the linking stage. Software-mode floating-point operations require longer run times and slightly increase code size. However, if floating-point operations are not frequently used, this approach is suitable for such applications.