Comparison of Cortex-M Processor Functional Modules

Hello everyone, I am Pi Zi Heng, a serious tech enthusiast. Today, I will introduce you to ARM Cortex-M Functional Modules.

The ARM Cortex-M processor family has developed to date (2016) with five generations of products, namely CM0/CM0+, CM1, CM3, CM4, and CM7.

1. Cortex-M Compatibility Features

To achieve software reuse in Cortex-M, ARM has designed the Cortex-M processors with downward compatibility and upward binary compatibility.

First, let’s look at what binary compatibility means. This feature mainly concerns software, referring to when a software (program) relies on header files or library files that are upgraded separately, the software’s functionality remains unaffected. For binary compatibility, the header files or library files relied upon must be binary compatible when upgraded.

So what is upward compatibility? Upward compatibility, also known as forward compatibility, means that software compiled on a lower version processor can run on a higher version processor.

The opposite concept to upward compatibility is downward compatibility, which means that a higher version processor can correctly run software compiled on a lower version processor.

Thus, both upward and downward compatibility can describe Cortex-M features, just with different subjects. We can say that Cortex-M programs are upward compatible, or that Cortex-M processors are downward compatible.

Specifically for Cortex-M processors, this compatibility feature manifests as:

  • From the processor’s perspective: The CM0 instruction set and functional modules are the most streamlined, while the CM7 instruction set and functional modules are the most comprehensive. There are no features present in lower version processors that are absent in higher version processors.

  • From the software perspective: The header files and functional functions provided by CMSIS are upward binary compatible. For example, if a CM0 software app uses the core_cm0.h header file, it can run on CM7 without needing to recompile using core_cm7.h (of course, using the new header file to compile the app is also fine).

2. Differences in Cortex-M Functional Modules

Since CM1 is mainly used in FPGA products, we will ignore CM1 in the following comparisons. We know that CM processors are downward compatible, so the functional modules of CM gradually increase with version upgrades. We will gradually compare from the lowest version.

2.1 CM0 vs CM0+

Comparison of Cortex-M Processor Functional Modules

Let’s first discuss CM0 and CM0+, starting with the base CM0 module:

  • ARMv6-M CPU Core: The core launched by ARM in 2007. Von Neumann architecture, 3-stage pipeline, supports most Thumb and a small part of Thumb-2 instruction sets, with a total of 57 instructions. Additionally, it includes a hardware multiplier for 32-bit return results.

  • NVIC Nested Vector Interrupt Controller: Used for interrupt management in the CPU’s normal Run mode. Supports a maximum of 32 external interrupts, with external interrupts able to set 4 levels of preemption priority (2 bits).

  • WIC Wake-up Interrupt Controller: Used for interrupt management in the CPU’s low-power Sleep mode.

  • AHB-Lite Bus: A 32-bit AMBA-3 standard high-performance system bus responsible for all Flash and SRAM instruction and data access.

  • Debug Module: 0-4 hardware breakpoints, 0-2 data watchpoints.

  • DAP Debug Interface: Supports JTAG and SWD interfaces through the DAP module.

Comparison of Cortex-M Processor Functional Modules

So what improvements does CM0+ have?

  • ARMv6-M CPU Core: The pipeline has been changed to 2 stages (many 8-bit MCUs have 2-stage pipelines, mainly to reduce power consumption).

  • NVIC Nested Vector Interrupt Controller: Added VTOR, which is the interrupt redirection feature.

So what additional features does CM0+ have?

  • MPU Memory Protection Unit: Provides hardware methods to manage and protect memory, controlling access permissions, with a maximum of 8*8 regions. Memory violations will return a MemManage Fault.

  • MTB On-Chip Trace Unit: Provides a better user experience for trace debugging, with an optimized exception capture mechanism that allows for faster bug localization.

  • Fast I/O: Fast I/O ports that can be accessed in one cycle are easier for bit-banging (e.g., GPIO simulating SPI, IIC protocols).

2.2 CM0+ vs CM3

Comparison of Cortex-M Processor Functional Modules

After comparing CM0 and CM0+, let’s see how CM3 enhances upon CM0+:

So what improvements does CM3 have?

  • ARMv7-M CPU Core: The core launched by ARM in 2004. Harvard architecture, 3-stage pipeline + branch prediction, supports all Thumb and Thumb-2 instruction sets. It includes a 32-bit hardware multiplier that can return 64-bit operation results, and a new 32-bit hardware divider.

  • NVIC Nested Vector Interrupt Controller: Supports up to 240 external interrupts, with interrupt priorities that can be grouped (preemption priorities, response priorities), and 8-bit priority settings (maximum 128 levels of preemption priority corresponding to a minimum of 2 levels of response priority, maximum 256 levels of response priority corresponding to no preemption priority).

  • 3x AHB-Lite Bus: In addition to the original system bus responsible for SRAM access, two new ICode and DCode buses have been added to complete instruction and data access on Flash.

  • Debug Module: 0-8 hardware breakpoints, 0-4 data watchpoints.

  • ITM/ETM Trace Unit: ITM better supports printf style debugging, while ETM provides real-time instruction and data tracing.

So what additional features does CM3 have?

Well, CM3 does not add any unique modules compared to CM0+, but instead lacks the Fast I/O Port.

2.3 CM3 vs CM4

Comparison of Cortex-M Processor Functional Modules

After comparing CM0+ and CM3, let’s see how CM4 enhances upon CM3:

So what improvements does CM4 have?

  • ARMv7E-M CPU Core: Adds support for DSP-related instructions.

So what additional features does CM4 have?

  • DSP Digital Signal Processing Unit: Adds support for single-cycle 16/32-bit MAC, dual 16-bit MAC, and 8/16-bit SIMD algorithms in the digital signal processing unit.

  • FPU Floating Point Unit: Adds single-precision (float type) floating point unit compatible with IEEE-754 standards (VFPv4-SP).

2.4 CM4 vs CM7

Comparison of Cortex-M Processor Functional Modules

After comparing CM3 and CM4, let’s see how CM7 enhances upon CM4:

So what improvements does CM7 have?

  • ARMv7E-M CPU Core: 6-stage pipeline + branch prediction.

  • 2x AHB-Lite Bus: Streamlined to 2 AHB buses, with the AHB-P peripheral interface completing the original system bus functions, and the AHB-S slave interface responsible for external bus controller functions (like DMA) as well as TCM interface functions.

  • MPU Memory Protection Unit: Can divide memory into a maximum of 16*8 regions.

  • FPU Floating Point Unit: Adds a double-precision (double type) floating point unit compatible with IEEE-754 standards (VFPv5).

So what additional features does CM7 have?

  • I/D-Cache: This is what we typically understand as L1 Cache, with each cache size ranging from 4-64KB.

  • I/D-TCM: A tightly coupled RAM with the processor core, providing performance comparable to Cache, but with greater determinism, with a maximum memory of 16MB.

  • ECC Features: Provides error correction and recovery functions for L1 Cache, enhancing system reliability.

  • AXI-M Bus: A 64-bit AXI bus based on AMBA 4, used to support L2 memory attached to the system.

Leave a Comment