1. Introduction
This article attempts to briefly outline the development history of ARM processors, the evolution of their architecture, including the application directions of different processors. However, we will focus on the Cortex-A series and also introduce the inheritance relationships between different microarchitectures, as well as which design teams they originated from. This will likely be a series, covering an overview of the development of ARM processors and architectures, ARMv7-A architecture and microarchitecture analysis, ARMv8-A architecture and microarchitecture analysis, key technologies such as Trustzone/big.LITTLE/NEON/AMBA, and an overall analysis of the ARM software ecosystem.
2. Development History and Business Model of ARM Processors
The predecessor of ARM was Acorn Computers in Cambridge, UK, which was officially established in 1990. After the milestone release of ARM9 in 1997, it entered a period of rapid development.
ARM’s business model is IP licensing, which charges a one-time technology licensing fee and royalty commission through intellectual property licensing. However, ARM focuses only on designing IPs such as CPUs and GPUs, while the manufacturing or production is handled by the licensed customers. ARM’s revenue includes initial licensing fees and royalties, where royalties are based on the shipment volume of chips using ARM technology, and are proportionately deducted.
ARM offers several types of licensing:
-
Processor Licensing: This is the lowest level of licensing, allowing partner manufacturers to use ARM-designed processors without altering the original design, but they can adjust parameters such as product frequency and power consumption as needed. -
POP (Processor Optimization Pack) licensing: This is a higher form of processor licensing, where ARM sells optimized processors to licensed partners, facilitating the design and production of performance-guaranteed processors under specific processes, such as Samsung, Texas Instruments, Broadcom, Freescale, Fujitsu, etc., launching their own chips based on ARM processors. -
Architecture/Instruction Set Level Licensing: This allows modifications to the ARM architecture or ARM instruction set to design their own processors, such as Qualcomm’s Krait architecture and Apple’s Swift architecture.
3. Overview of ARM Processor Architecture Development
Main Features of ARMv6
-
Thumb-2: An extension of the 32-bit ARM instruction set aimed at achieving higher code density. -
TrustZone: A security extension that isolates physical resources into Secure and Non-Secure worlds. The processor can switch between these two worlds using the SMC instruction. This extension requires support from the bus and MMU, and different IPs are needed to control DDR, SRAM, peripherals, etc., to achieve security isolation. -
SIMD: This generation of SIMD instruction set relies on vector registers that reuse ARM’s general-purpose registers. It supports 8/16bit integers and can perform parallel calculations of 4 8bit integers or 2 16bit integers.
Main Features of ARMv7
-
Advanced SIMD: In the ARMv7-A architecture, ARM further developed its SIMD instruction set, naming it NEON. This generation of instruction set has 32 64bit NEON vector registers and also supports single-precision floating point. -
VFPv3/v4: The floating-point architecture (VFP) provides hardware support for half-precision, single-precision, and double-precision floating-point operations, compliant with the IEEE-754 standard. VFPv4 adds the half-precision extension and multiply-add instruction compared to VFPv3. ARM’s VFP can be implemented as either 32 or 16 double-word registers, represented as VFPv3-D32 and VFPv3-D16 respectively. When NEON and VFP are implemented simultaneously, VFP can only be implemented as VFP-D32. -
LPAE (Large Physical Address Extension): A 40-bit address extension that can expand the addressing range from 2^32 4GB to 2^40 1TB, with some processors later expanding to 44bit. -
Virtualization: A new CPU mode called HYP mode is added to the Normal world, requiring support from the MMU and GIC (interrupt controller) to provide IPA (Intermediate Physical Address) and virtual interrupt forwarding.
Main Features of ARMv8
-
Secure EL2: This feature was introduced in Armv8.4-A, adding virtualization support to the Secure world. -
PA (Pointer Authentication): Introduced in v8.3, enhances security by checking function pointers during function jumps to prevent modifications (using MAC algorithm). -
BTI (Branch Target Identifiers): Introduced in v8.5, limits the targets of indirect jumps. Used in conjunction with PA to significantly reduce control flow attacks. -
MTE (Memory Tagging Extension): Introduced in v8.5, tags memory regions, requiring pointers with the same tag to access protected areas. Can detect overflow and UAF vulnerabilities. -
Scalar Floating Point: AArch64 provides 32 128-bit registers for SIMD vector and scalar floating-point support; AArch32 provides 32 64-bit registers for SIMD vector and scalar floating-point support. -
Enhanced Crypto: v8 introduced cryptography instructions, including AES, SHA-1/SHA-256, and later added support for “SHA3/SHA512/SM3/SM4” in v8.4. -
bfloat: Introduced in v8.2, adds FP16 data processing instructions. -
Vector Extensions: Introduced in v8.2, Scalable Vector Extension (SVE) is the next generation SIMD instruction set under the ARM AArch64 architecture, aimed at accelerating high-performance computing, allowing vectors to vary from 28 to 2048 bits in length. -
Improved virtualization support: Introduced in v8.4.
Main Features of ARMv9
Improved Security: Primarily introduces a new CCA (confidential compute architecture). Confidential computing can create a hardware-based secure runtime environment to execute computations, protecting sensitive data and code from even privileged software, ensuring that even the highest privilege OS cannot interfere. While the OS can decide when to run, the application resides in an isolated hardware-protected memory area, separated from everything in the system. This means that even if the application is infected with malware, it will not spread to other parts of the device.
Digital Signal Processing & Machine Learning: SVE was introduced in ARMv8.2, but it had limited scope in its first iteration of the new variable-length vector SIMD instruction set, mainly targeting HPC workloads, lacking many more general instructions still covered by NEON. SVE2 aims to address this issue by supplementing the new scalable SIMD instruction set with required instructions to serve workloads like DSP/ML that still use NEON. In addition to various modern SIMD features, SVE and SVE2’s advantages also lie in their variable vector sizes, ranging from 128 bits to 2048 bits, allowing them to operate on any hardware. This means that software developers will only need to compile their code once, and if a future CPU has native 512b SIMD execution pipelines, that code will be able to fully utilize the entire width of the unit. Similarly, the same code will run on conservative designs with lower hardware execution width capabilities, which is crucial for Arm designs from IoT, mobile to data center CPUs. It can achieve all this while retaining the 32-bit encoding space of the Arm architecture. However, architectures like X86 require new instructions and extensions to be added based on vector sizes.
4. Classification and Application Areas of ARM Processors
The current product line of ARM processors mainly includes Cortex-A, Cortex-R, Cortex-M, SecureCore, Neoverse, and several other series.
Cortex-A Series Processors
Application Processors, primarily aimed at mobile computing and smartphones, introduced big.LITTLE in 2011 and evolved into DynamIQ by 2017. This series has gone through ARMv7, ARMv8, and ARMv9 (not all ARMv8 and ARMv9 processors are listed in the above image, further descriptions will follow). The series supports a virtual memory system architecture (VMSA) based on a memory management unit (MMU). ARMv8 supports A64, A32, and T32 instruction sets, while ARMv9 no longer supports 32-bit instruction sets.
Cortex-R Series Processors
Real-time Processors, a high-performance processor series aimed at real-time applications, such as hard disk controllers, automotive drive systems, and modem basebands. This series mainly supports a protected memory system architecture (PMSA) based on a memory protection unit (MPU). It supports A32 and T32 instruction sets. The latest is the Cortex-R82 processor, which is 64bit, with MMU, supports RichOS, and supports NEON.
Cortex-M Series Processors
SecureCore Series Processors
Neoverse Series Processors
The development history of ARM processors is as follows:
Other Application Areas
ARM’s Automotive Enhanced (AE) IP series launched the Cortex-A76AE processor in 2018, primarily used for automotive ADAS (Advanced Driver Assistance Systems), supporting Split-Lock technology, allowing CPU cores to operate in different modes, maximizing performance in separate mode and locking two cores/threads in lock mode to ensure safety. In addition to safety performance, the Cortex-A65AE processor is also ARM’s first to support SMT multi-threading technology, which improves data throughput. Since ADAS requires processing a large computational load and high throughput from many sensors, ARM claims that the Cortex-A65AE processor achieves a 3.5 times increase in throughput compared to its predecessor (Cortex-A53) while maintaining higher energy efficiency. In 2020, ARM launched the new Cortex-A78AE, which brought higher performance CPU cores and first-time use of AE-level GPU Mali-G78AE and ISP Mali-C71AE. The new Cortex-A78AE is based on the Cortex-A78 microarchitecture, achieving a 30% IPC improvement compared to the previous generation Cortex-A76AE.
In November 2020, Apple released the MAC portable machine based on ARM processors, with the SoC chip M1. Additionally, according to online information, Qualcomm’s PC chip based on ARM processors is expected to be launched in the next two years.
5. ARM Cortex-A Series Processors
Subcategories of ARM Cortex-A Series Processors
The ARM Cortex-A series processors currently mainly include ultra-low power cores, small cores, large cores, and ultra-large cores, focusing on ARMv7, ARMv8, and ARMv9 architectures.
-
Ultra-low Power processors include: ARMv7 architecture A5 and A7, ARMv8 architecture A35, A32, and A34; -
As Small Cores, the processors include: ARMv7 architecture A8 and A9, ARMv8 architecture A53 and A55, and ARMv9 architecture A510; -
As Large Cores, the processors include: ARMv7 architecture A15 and A17, ARMv8 architecture A57, A72, A73, A75, A76, A77, and A78, as well as ARMv9 architecture A710 and A715; -
As Ultra-large Cores, the processors include: ARMv8 architecture X1, and ARMv9 architecture X2 and X3;
Since the introduction of big.LITTLE technology in 2011, ARMv7 architecture A7 can be paired with A15/A17 as large cores; in 2012, ARMv8 architecture A53 was paired with A57/A72/A73; of course, with the introduction of A35, A53/A55 can also be used as large cores with A35 as small cores; in 2017, big.LITTLE evolved into DynamIQ, allowing for more flexible and varied combinations of large and small cores, such as 1+3+4 (1 ultra-large core, 3 large cores, and 4 small cores), usually using the large core overclocked as the ultra-large core.
It wasn’t until 2020, with the release of Cortex-X1, that a true ultra-large core was defined as a “customizable” mobile platform, where chip manufacturers can propose requirements to ARM based on budget and needs, and ARM will adjust the specifications of each module design according to different application scenarios. Cortex-X2/X3 should all be public versions (this is uncertain). The latest combinations can have 1 X3 (ultra-large core) + 3 A715 (large cores) + 4 A510 (small cores).
Evolution Relationships of ARM Cortex-A Series Processors
Ultra-large cores branch from A77, respectively A77->X1->X2->X3
The evolution path of small cores is: A9->A7->A53->A55->A73->A510
The evolution path of ultra-low power cores is: A9->A5->A35->A32/A34
ARM Processor Design Teams
ARM processors are mainly developed by the Austin, Sophia, and Cambridge teams, with the main processors as follows:
-
Austin (Texas)
-
Cortex-A8, Cortex-A15, Cortex-A57, Cortex-A72, Cortex-A76, Cortex-A77, Cortex-A78 -
Cortex-X1, Cortex-X2, Cortex-X3 -
Neoverse N1, Neoverse N2, Neoverse V1 -
Sophia-Antipolis (France)
-
ARM11, Cortex-A9, Cortex-A12, Cortex-A17, Cortex-A73, Cortex-A75 -
Cambridge (UK)
-
Cortex-A5, Cortex-A7, Cortex-A53, Cortex-A35, Cortex-A55
End of the article.
References
-
https://en.wikichip.org/wiki/arm_holdings -
https://en.wikipedia.org/wiki/AArch64 -
https://developer.arm.com/documentation/102378/0201/Armv8-x-and-Armv9-x-extensions-and-features -
https://www.arm.com/zh-TW/architecture/security-features/arm-confidential-compute-architecture -
https://broadgeek.com/2021/12/12/c8bf/ -
https://en.wikipedia.org/wiki/List_of_ARM_processors -
https://www.anandtech.com/show/13727/arm-announces-cortex65ae-for-automotive-first-smt-cpu-core -
https://www.anandtech.com/show/13398/arm-unveils-arm-safety-ready-initiative-cortexa76ae-processor -
https://www.anandtech.com/show/16114/arm-announces-cortexa78ae-malig78ae-and-malic71ae-autonomous-system-ips -
http://www.anandtech.com/show/10347/arm-cortex-a73-artemis-unveiled -
ARM Industry Research Framework, Pacific Securities
Leave a Comment
Your email address will not be published. Required fields are marked *