Introduction and Comparison of ARM Cortex-M Processor Family – Part 2

Author: Joseph Yiu, Senior Embedded Technology Manager, ARM

William Gao, ARM China Application Engineer and

Gabriel Wang, ARM China Embedded Application Engineer also contributed to the Chinese version of this article.

Last Friday, I shared the white paper titled “Introduction and Comparison of ARM Cortex-M Processor Family – Part 1” (click to review). Due to requests from everyone, I am releasing Part 2 of the white paper on Monday, hoping it will be helpful for your work and knowledge base.

4 System Features

4.1 Low Power Consumption

Low power consumption is a key advantage of Cortex-M processors. It is an integral part of their architecture:

· WFI and WFE instructions

· Architecture-level sleep mode definitions

Additionally, Cortex-M supports many other low power features:

· Sleep and deep sleep modes: features supported at the architecture level, further extendable through device-specific power management registers.

· Sleep-on-exit mode: a low power technique for interrupt-driven applications. When enabled, the processor automatically enters sleep mode after the exception handler finishes and there are no other pending interrupts to handle. This avoids unnecessary instruction execution in thread mode, saving power and reducing unnecessary stack read/write operations.

· Wake-up interrupt controller (WIC): an optional feature that detects interrupts in specific low power states by a small module independent of the processor. For example, in the state-retaining power management (SRPG) design, when the processor is powered off.

· Clock off and architecture-level clock off: saves power by shutting off the clock input to the processor registers or sub-modules.

All these features are supported by Cortex-M0, Cortex-M0+, Cortex-M3, Cortex-M4, Cortex-M7, Cortex-M23, and Cortex-M33. Additionally, various low power design techniques are used to reduce processor power consumption.

Because of fewer circuits, Cortex-M0 and Cortex-M0+ processors consume less power than Cortex-M3, Cortex-M4, and Cortex-M7. Furthermore, Cortex-M0+ has additional optimizations that reduce program access (such as jump backups) to maintain low power at the system level.

Cortex-M23 is not as small as Cortex-M0 and Cortex-M0+, but under the same configuration, it still achieves the same efficiency as Cortex-M0+.

Due to better performance and low power optimizations, Cortex-M33 has better efficiency than Cortex-M4 under the same configuration.

4.2 Bit-band Feature

Cortex-M3 and Cortex-M4 processors support an optional feature called bit-band, allowing addressable 1MB of address space through bit-band alias addresses (one segment starts from address 0x20000000 in SRAM space. The other segment starts from address 0x40000000 in peripheral device space). Cortex-M0, M0+, and Cortex-M1 do not support bit-band functionality but can implement bit-band functionality at the system level using bus-level components in the ARM Cortex-M System Design Kit (CMSDK). Cortex-M7 does not support bit-band as its Cache functionality cannot be used with bit-band (the Cache controller does not know the alias addresses of memory space).

ARMv8-M’s TrustZone does not support bit-band due to the requirement that the two different addresses needed for bit-band aliasing may reside in different security domains. For these systems, bit operations on peripheral device data can instead be handled at the peripheral device level (e.g., by adding bit set and clear registers).

4.3 Memory Protection Unit (MPU)

Except for Cortex-M0, other Cortex-M processors have an optional MPU to implement memory access permissions and memory space attributes or memory region definitions. In embedded systems running real-time operating systems, the operating system defines memory access permissions and memory space configurations for each task to ensure that no task can corrupt the address space of other tasks or the operating system kernel. Cortex-M0+, Cortex-M3, and Cortex-M4 have 8 programmable region spaces and a very similar programming model. The main difference is that Cortex-M3/M4’s MPU allows for two levels of memory space attributes (e.g., system-level cache types), while Cortex-M0+ only supports one level. Cortex-M7’s MPU can be configured to support 8 or 16 regions with two levels of memory space attributes. Cortex-M0 and Cortex-M1 do not support MPU.

Cortex-M23 and Cortex-M33 also support MPU options, and if TrustZone security extensions are implemented (one for secure software programs and another for non-secure software programs), there can be up to two MPUs.

4.4 Single Cycle I/O Interface

The single cycle I/O interface is a unique feature of the Cortex-M0+ processor, allowing it to quickly execute I/O control tasks. Most Cortex-M processors have bus interfaces based on AHB Lite or AHB 5 protocols, which are pipeline-based bus protocols running at high clock frequencies. However, this means each transfer requires two clock cycles. The single clock cycle I/O interface adds an additional simple non-pipelined bus interface connected to device-specific peripherals like GPIO (General Purpose Input Output). Combined with the single cycle I/O and the naturally low jump cost of Cortex-M0+ (only two pipeline stages), many I/O control operations can run faster than most other microcontroller architectures.

5 Performance Considerations

5.1 General Data Processing Capability

In the general microcontroller market, benchmark data is often used to measure microcontroller performance. Table 7 shows the performance data from commonly used benchmark tests for Cortex-M processors:

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Table 7: Performance scores of commonly used benchmarks for Cortex-M processors

(Source: CoreMark.org and ARM websites)

It is important to note that the Dhrystone used for testing is compiled from official source code without enabling inline and multi-file compilation options (official scores). However, many microcontroller manufacturers quote fully optimized compiled Dhrystone test data.

However, performance test data from benchmark tools may not accurately reflect the performance your application can achieve. For example, the acceleration effects of single cycle I/O interfaces and DSP applications using SIMD, or the use of FPU in Cortex-M4/M7 are not reflected in these test data.

Generally, Cortex-M3 and Cortex-M4 provide higher data processing performance for the following reasons:

· A richer instruction set

· Harvard bus architecture

· Write buffer (single cycle write operations)

· Branch target prediction

Cortex-M33 is also based on a Harvard bus architecture with a rich instruction set. However, unlike Cortex-M3 and Cortex-M4, the Cortex-M33 processor has a redesigned efficient pipeline that supports limited instruction dual-issue (up to two instructions can be executed in one clock cycle).

Cortex-M7 supports higher performance because M7 has a dual-issue six-stage pipeline and supports branch prediction. Additionally, by supporting instruction and data caches, and tight-coupled memory even with slow memory (e.g., embedded Flash), it can avoid performance loss and achieve higher system-level performance.

However, certain I/O-intensive tasks run faster on Cortex-M0+ because of:

· Shorter pipeline (jumps only require two cycles)

· Single cycle I/O ports

Of course, there are also device-related factors. For example, system-level design and memory speed can also impact system performance.

Your own application is often the best benchmark you need. A CoreMark score does not mean that a processor that is twice as fast will execute your application twice as fast. For I/O-intensive applications, device-related system-level architecture has a huge impact on performance.

5.2 Interrupt Latency

Another performance-related metric is interrupt latency. This is usually measured by the number of clock cycles from the interrupt request to the execution of the first instruction of the interrupt service routine. Table 8 lists the interrupt latency comparisons of Cortex-M processors under zero-wait memory system conditions.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Table 8: Interrupt latency comparison under zero-wait memory system conditions

In fact, real interrupt latency is influenced by the memory system’s wait states. For example, many microcontrollers running at frequencies above 100 MHz are paired with very slow Flash memory (e.g., 30 to 50 MHz). Although Flash access acceleration hardware is used to improve performance, interrupt latency is still affected by the wait states of the Flash storage system. Therefore, it is entirely possible for a Cortex-M0/M0+ system running in a zero-wait memory system to have shorter interrupt latency than Cortex-M3/M4/M7.

When evaluating performance, do not forget to consider the execution time of the interrupt handler. Some 8-bit or 16-bit processor architectures may have very short interrupt latencies but take multiple clock cycles to complete interrupt handling. Both very short interrupt response times and short interrupt handling times are essential for actual effectiveness.

6 Debugging and Tracing Features

6.1 Overview of Debugging and Tracing Features

There are several differences between Cortex-M processors. A summary is provided in Table 9.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Table 9: Comparison of Debugging and Tracing Features

The debugging architecture of Cortex-M processors is designed based on the ARM CoreSight debugging architecture, which is a very extensible architecture that supports multiprocessor systems.

Table 9 lists typical designs that need to be considered. Under the CoreSight architecture, debugging and tracing interface modules are separate from the processor. Therefore, the debugging and tracing connections of the device you adopt may differ from those in Table 9. Additional CoreSight debugging components may also be added to enhance debugging features.

6.2 Debug Connections

The debugging interface allows the debugger to:

– Access registers for debugging and tracing features.

– Access memory space. For Cortex-M series processors, even when the processor is running, memory space access can be performed. This is called real-time memory access.

– Access processor core registers. This can only be operated when the processor is stopped.

– Access the trace history generated by the Micro Trace Buffer (MTB) in Cortex-M0 processors.

Additionally, the debugging interface can also be used for:

– Flash programming

Cortex-M series processors can choose a traditional 4 to 5 pin (TDI, TDO, TCK, TMS, and optional nTRST) JTAG interface or select a new serial debugging protocol interface that only requires two pins, which is very suitable for devices with a limited number of pins.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Figure 10: Serial wire or JTAG debug interface allows access to processor’s debug features and memory space including peripherals

The serial wire debug protocol interface can handle all features supported by JTAG, supporting parity. The serial debug protocol is widely adopted by ARM tool vendors, and many debug adapters support both protocols, sharing the TCK and TMS pins on the serial wire model.

6.3 Trace Interface

The trace interface allows the debugger to collect program execution information in real-time (with minimal delay) while the program is running. The collected information can be the program instruction flow information generated by the embedded trace macrocell (ETM) supported by Cortex-M3/M4/M7/M33 (instruction tracing), data tracing unit (DWT) generated data/event/performance analysis information, or information generated by the software-controlled data trace unit (ITM).

There are two types of trace interfaces available:

– Trace port – multiple data lines plus a clock signal line. It has a higher trace bandwidth than SWV and can support all trace types of SWV plus instruction tracing. On devices of Cortex-M3/M4/M7 or Cortex-M33, the trace port usually has 4 data lines and one clock line.

– Serial wire viewer (SWV) – a single pin trace interface that can selectively support data tracing, event tracing, performance analysis, and measurement tracing.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Figure 11: Trace port supports the necessary bandwidth for instruction tracing and other tracing functions

The trace interface provides the ability to obtain a large amount of useful information while the processor is running. For example, the embedded trace macrocell (ETM) can capture instruction execution history, the data tracing unit (ITM) allows software to generate messages (e.g., via printf) and utilize the Trace interface to obtain them. Additionally, Cortex-M3/M4/M7/M33 supports the data tracing unit (DWT) module.

– Optional data tracing: information about memory addresses (e.g., a combination of address, data, and timestamp) can be collected when the processor accesses that address.

– Performance analysis tracing: the number of clock cycles used by the CPU for different operational tasks (e.g., memory access, sleep).

– Event tracing: provides the runtime and history of server responses to interrupts/exceptions.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Figure 12: Serial wire viewer provides a low-cost, low-pin count tracing solution

These tracing features are widely adopted by various tool vendors, and the collected information is presented intuitively in various ways. For example, data obtained from DWT can be displayed as waveforms in the Keil µVision debugger (part of Keil microcontroller development tools) as shown in Figure 13.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Figure 13: Logic analyzer in Keil µVision debugger

Although Cortex-M0 and Cortex-M0+ do not support trace interfaces, Cortex-M0+ supports a feature called the micro trace buffer (MTB). The MTB allows users to allocate a small portion of system SRAM as a cache for storing instructions, typically set as a circular buffer, capturing the latest instruction execution history and displaying it on the debugger.

This MTB tracing feature is also supported by Cortex-M23 and Cortex-M33.

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Figure 14: MTB provides a low-cost instruction tracing solution for Cortex-M0+/M23/M33

7 Product Development Based on Cortex-M Processors

7.1 Why Cortex-M Series Processors are Easy to Use

Although the Cortex-M series processors have many features, they are easy to use. For example, almost all development can be done using high-level programming languages like C. Although products based on the Cortex-M series processors vary widely (e.g., different memory sizes, different peripherals, performance, and packaging, etc.), the consistency of the architecture allows developers who have experience with one of them to easily start using a new Cortex-M processor.

To achieve easier software development, better software reusability, and portability, ARM developed CMSIS-CORE, where CMSIS stands for Cortex-Microcontroller Software Interface Standard. CMSIS-CORE provides a standard hardware abstraction layer (HAL) for various features of the processor such as peripheral management control through a set of APIs, and is integrated into device driver libraries provided by various microprocessor manufacturers, supported by various development toolkits.

In addition to CMSIS-CORE, CMSIS also includes a DSP software library (CMSIS-DSP). This library provides various DSP functions optimized for Cortex-M4 and Cortex-M7, and also supports other Cortex-M series processors. Both CMSIS-CORE and CMSIS-DSP libraries are free and can be downloaded from GitHub (CMSIS 4, CMSIS 5), and are supported by many tool vendors.

7.2 Processor Selection

For most microcontroller users, the selection criteria for microcontroller devices mainly depend on cost and peripheral support. However, many of you may be processor core designers choosing processors for your next chip product, in which case the processor itself will be the focus of consideration.

Clearly, in such cases, performance, chip area, power consumption, and cost will be crucial factors. At the same time, there are various other factors to consider. For example, if you are developing an Internet-connected product, you may need to choose a processor with TrustZone security extensions and an MPU, allowing you to protect critical security feature data with TrustZone, run certain tasks at a non-privileged level, and use the MPU to protect memory space. On the other hand, if you need to certify your product in certain aspects, the instruction tracing generated by ETM supported by Cortex-M23, Cortex-M33, Cortex-M3, Cortex-M4, and Cortex-M7 will be very helpful for code coverage certification.

In other chip design areas, if you are designing small sensors powered by energy-harvesting devices, then Cortex-M23 and Cortex-M0+ would be the best choice, as they are very small and have advanced power consumption optimizations.

7.3 Ecosystem

One of the key advantages of using ARM Cortex-M series processors is the extensive support of mature devices, development toolchains, and software libraries. Currently, there are:

– More than 15 microcontroller manufacturers selling microcontroller products based on ARM Cortex-M series cores.

– More than 10 development kits supporting ARM Cortex-M series processors.

– Operating systems from over 40 vendors supporting Cortex-M series processors.

This gives you a lot of choices, allowing you to obtain the best combination of devices, development tools, and middleware suitable for your target application.

8 Conclusion

There is always a need to balance performance, features, chip area, and power consumption. To this end, ARM has developed various Cortex-M processors with different levels of instruction set features, performance, system, and debugging features. This article introduces the similarities and differences among the Cortex-M processor family.

Despite these differences, the consistency of the architecture and the standardized APIs of CMSIS-CORE provide better portability and reusability for Cortex-M series processor software. At the same time, the Cortex-M series processors are very convenient to use. Therefore, the Cortex-M series processors have quickly become the most popular 32-bit processor architecture in the microcontroller market.

More exciting content for you to choose from

# Featured Articles #

Details│Teaching you how to measure pressure using pressure sensors

Vendor Perspective│Big.LITTLE in the DynamIQ World

Map of the Chinese Smart Automotive Industry Chain

A method to improve the operation quality of stepper motors through current control

An overview of 40 common chip packaging technologies

Recent Searches│Common Terminology in Automotive Electronics Technology English-Chinese Comparison

How wireless power transmission can achieve a more convenient daily life

Recent Offline Event Announcements

May 16 Xiamen│ Tektronix 2017 Asia Pacific Technology Forum

May 18 Tianjin│ Tektronix 2017 Asia Pacific Technology Forum May 26 Shenzhen│Vicor Power Technology Seminar Invitation June 9 Chongqing│Automotive Electronics Technology Seminar June 16 Shanghai│Automotive Electronics Technology Seminar

Introduction and Comparison of ARM Cortex-M Processor Family - Part 2

Long press the QR code to learn about recent offline events and quickly register for participation

Thank you for taking the time Introduction and Comparison of ARM Cortex-M Processor Family - Part 2 Introduction and Comparison of ARM Cortex-M Processor Family - Part 2·Introduction and Comparison of ARM Cortex-M Processor Family - Part 2 Introduction and Comparison of ARM Cortex-M Processor Family - Part 2to read this article

For more exciting information, please click [Read Original]

Leave a Comment

×