Which ARM Cortex Core is Best for My Application: A Series, R Series, or M Series?

The ARM Cortex core series offers a wide range of scalable performance options, giving designers the opportunity to choose the core that best fits their applications, rather than adopting a one-size-fits-all solution. However, with ARM’s rich product line and numerous varieties, how should one choose the right chip type for their products? Let’s follow the editor of “Microcontrollers and Embedded Systems Applications” to find out!

Cortex series can be broadly divided into three categories:

● Cortex-A—Application processor cores for performance-intensive systems

● Cortex-R—High-performance cores for real-time applications

● Cortex-M—Microcontroller cores for various embedded applications

Cortex-A Series

Cortex-A processors provide a range of solutions for devices that utilize operating systems such as Linux or Android, used in various applications from low-cost handheld devices to smartphones, tablets, set-top boxes, and enterprise network equipment. The earlier Cortex-A series processors (A5, A7, A8, A9, A12, A15, and A17) are based on the ARMv7-A architecture. Each core shares the same feature set, such as the NEON media processing engine, TrustZone security extensions, single-precision and double-precision floating-point support, and support for multiple instruction sets (ARM, Thumb-2, Thumb, Jazelle, and DSP). At the same time, these processors offer extremely high design flexibility, capable of delivering the best performance and expected efficiency.

Although the Cortex-A5 core is the smallest and lowest power member of the Cortex-A series, it has the potential to support multicore performance and is compatible with the higher-end members of the series (A9 and A15). For designers who previously used ARM926EJ-S or ARM1176JZ-S processors, choosing the A5 is a natural choice due to its higher performance and lower chip cost.

Which ARM Cortex Core is Best for My Application: A Series, R Series, or M Series?

The Cortex-A7 is similar to the Cortex-A5 in terms of power consumption and size, but its performance is about 20% better, and it is fully architecture compatible with the Cortex-A15 and Cortex-A17. The Cortex-A7 is an ideal choice for cost-sensitive smartphones and tablets, and it can also be combined with Cortex-A15 or Cortex-A17 in what ARM calls a “big.LITTLE” processing structure. The big.LITTLE structure is essentially a power optimization technique; the combination of a high-performance CPU (e.g., Cortex-A17) and a high-efficiency CPU (e.g., Cortex-A7) can provide higher sustained performance while significantly saving overall power consumption, reducing CPU energy usage by up to 75%, and extending battery life.

The performance demands of smartphones and tablets have evolved far faster than battery capacity, giving developers a significant advantage with this configuration. Design methods like big.LITTLE, as part of an overall system design strategy, can significantly reduce the gap caused by this battery technology.

Next, let’s take a look at the high-end products in the Cortex-A series processors—the Cortex-A15 and Cortex-A17 cores. Both of these cores are high-performance processors that can be used in various configurations. The Cortex-A17 is the most efficient “mid-range” processor, directly targeting high-end smartphones and tablets. The Cortex-A9 was widely used in this market, but compared to the Cortex-A9, the Cortex-A17 offers over 60% performance improvement (in cycle count) while also improving overall efficiency. The Cortex-A17 can be configured with up to four cores, each with a complete out-of-order pipeline. As previously mentioned, the Cortex-A17 can be combined with the Cortex-A7 to create an efficient big.LITTLE configuration and can also be paired with high-end mobile graphics processors (such as ARM’s MALI) to form a very efficient design overall.

The Cortex-A15 is the highest performing member of the series, offering twice the performance of the Cortex-A9 (in mobile configuration mode). It is fully capable of applications such as high-end smartphones or tablets, and the multicore Cortex-A15 processor can run at speeds of up to 2.5GHz, supporting applications like low-power servers or wireless infrastructure. The Cortex-A15 is ARM’s first processor to provide hardware support for data management and arbitration in virtual software environments. Applications in these software environments can simultaneously access system resources, ensuring reliable operation and isolation of devices in virtual environments.

The Cortex-A50 series extends the application range of the Cortex-A series into the low-power server domain. These processors are based on the ARMv8 architecture and support AArch64—an efficient 64-bit runtime that can coexist with the current 32-bit runtime. One of the obvious reasons for upgrading to 64-bit is to support physical memory greater than 4GB, although the Cortex-A15 and Cortex-A7 already have this capability. In this case, upgrading to 64-bit actually provides better support for server applications, as more and more operating systems and applications in servers are adopting 64-bit. Of course, the Cortex-A50 series provides a power-optimized solution for this situation. The same is true for the desktop market; supporting 64-bit means that the Cortex-A50 series can be more widely applied in this segment, and to some extent proves that future 64-bit operating systems will eventually migrate to mobile applications.

Cortex-R

Having introduced the Cortex-A, we now turn to the Cortex-R series—the smallest ARM processors in the derivative products, which are also the least known. Cortex-R processors are designed for high-performance real-time applications, such as hard disk controllers (or solid-state drive controllers), network devices and printers in enterprises, consumer electronics (such as Blu-ray players and media players), and automotive applications (such as airbags, braking systems, and engine management). The Cortex-R series is similar in some aspects to high-end microcontrollers (MCUs), but it targets larger systems than those typically using standard MCUs. For example, the Cortex-R4 is very suitable for automotive applications.

The Cortex-R4 can run at clock speeds up to 600MHz (with 2.45 DMIPS/MHz), equipped with an 8-stage pipeline, dual issue, prefetch, branch prediction, and a low-latency interrupt system that can interrupt multi-cycle operations and quickly enter the interrupt service routine. The Cortex-R4 can also be configured with another Cortex-R4 to form a dual-core configuration, creating a redundant lock-step configuration with fault detection logic, making it very suitable for safety-critical systems.

The Cortex-R5 serves network and data storage applications well, extending the functionality of the Cortex-R4 to improve efficiency and reliability, enhancing error management in reliable real-time systems. One of the system features is the low-latency peripheral port (LLPP), enabling rapid peripheral read and write (without needing to perform a “read-modify-write” operation on the entire port). The Cortex-R5 can also achieve a processor-independent “lock-step” dual-core system, where each processor can execute its own program through its own “bus interface and interrupt.” This dual-core implementation can build a very powerful and flexible real-time response system.

The Cortex-R7 greatly expands the performance range of the R series cores, with clock speeds exceeding 1GHz and performance reaching 3.77 DMIPS/MHz. The 11-stage pipeline on the Cortex-R7 now enhances error management capabilities and improves branch prediction. There are various options for multicore configurations: lock-step, symmetric multiprocessing, and asymmetric multiprocessing. The Cortex-R7 also features a fully integrated Generic Interrupt Controller (GIC) to support complex priority interrupt handling. However, it is worth noting that while the Cortex-R7 has high performance, it is not suitable for applications running feature-rich operating systems (such as Linux and Android); the Cortex-A series is more suited for such applications.

Cortex-M

Finally, let’s discuss the Cortex-M series, specifically designed for the highly competitive MCU market. The Cortex-M series is built on the ARMv7-M architecture (for Cortex-M3 and Cortex-M4), while the lower Cortex-M0+ is based on the ARMv6-M architecture. The first Cortex-M processor was released in 2004, and as some mainstream MCU suppliers adopted this core and began producing MCU devices, the Cortex-M processor quickly gained market favor. It can be confidently said that the Cortex-M is to 32-bit MCUs what the 8051 is to 8-bit MCUs—an industry-standard core supported by numerous suppliers, each adopting this core along with their special developments to offer differentiated products in the market. For example, the Cortex-M series can be implemented as a soft core in FPGAs, but the more common use is as an MCU integrated with memory, clock, and peripherals. Within this series, some products focus on optimal energy efficiency, some focus on maximum performance, and some are specifically applied in niche markets such as smart meters.

The Cortex-M3 and Cortex-M4 are very similar cores. Both have a performance of 1.25 DMIPS/MHz, equipped with a 3-stage pipeline, multiple 32-bit bus interfaces, clock rates up to 200MHz, and very efficient debugging options. The biggest difference is that the Cortex-M4’s core performance is optimized for DSP. Both Cortex-M3 and Cortex-M4 share the same architecture and instruction set (Thumb-2). However, the Cortex-M4 adds a range of instructions optimized for processing DSP algorithms, including saturated operations and SIMD instructions. For instance, running a 512-point FFT every 0.5 seconds, if performed on the similarly produced Cortex-M3 MCU and Cortex-M4 MCU, the power consumption required by the Cortex-M3 is about three times that required by the Cortex-M4. Additionally, there is an option to implement a single-precision floating-point unit (FPU) on the Cortex-M4. If the application involves floating-point calculations, completing them on the Cortex-M4 is significantly faster than on the Cortex-M3. In other words, for applications that do not utilize the DSP or FPU features on the Cortex-M4, the performance and power consumption are the same as on the Cortex-M3. In other words, if DSP functionality is needed, choose Cortex-M4. Otherwise, go for Cortex-M3 to get the job done.

For cost-sensitive applications or those migrating from 8-bit to 32-bit, the lowest-end products in the Cortex-M series may be the best choice. Although the performance of the Cortex-M0+ is 0.95 DMIPS/MHz, slightly lower than that of the Cortex-M3 and Cortex-M4, it remains compatible with other high-end products in the same series. The Cortex-M0+ uses a subset of the Thumb-2 instruction set, most of which are 16-bit operands (though all data operations are 32-bit), allowing them to adapt well to the 2-stage pipeline service provided by the Cortex-M0+. By reducing branch mapping, the system can save some overall power consumption, and in most cases, the pipeline will retain the next four instructions. The Cortex-M0+ also features a dedicated bus for single-cycle GPIO, meaning you can achieve deterministic interfaces with bit-controlled GPIO, just like an 8-bit MCU, but process that data with the performance of a 32-bit core.

Another important distinguishing feature of the Cortex-M0+ is the addition of a Micro Trace Buffer (MTB). This peripheral allows designers to use some on-chip RAM to store program branches during debugging. These branches can then be sent back to the integrated development environment, allowing for the reconstruction of program flow. This feature provides a preliminary instruction tracing capability, which is quite meaningful for the Cortex-M3 and Cortex-M4, which do not have the extended tracing macro unit (ETM) functionality. The level of debugging information extracted from the Cortex-M0+ is significantly higher than that of 8-bit MCUs, making those difficult debugging issues easier to resolve.

In summary, the Cortex processor series offers a variety of options to meet your application performance needs. Whether for high-end tablets or ultra-low-cost wireless sensor nodes in IoT, you can find a processor that fits your application requirements.

Disclaimer: This article is a network reprint, and the copyright belongs to the original author. If there are any copyright issues, please contact us, and we will confirm the copyright based on the copyright certificate you provide and pay remuneration or delete the content.

Related posts

Leave a Comment Cancel reply