Overview of ARM v8 Processors: Architecture and Technology

Overview of ARM v8 Processors: Architecture and Technology

Overview of ARM v8 Processors: Architecture and Technology

Overview of ARMv8 Architecture

The ARMv8 architecture includes 32-bit and 64-bit execution states, introducing the ability to execute with 64-bit wide registers while providing backward compatibility mechanisms to allow existing ARMv7 software to run.
  • • AArch64: The 64-bit execution state in ARMv8.
  • • AArch32: The 32-bit execution state in ARMv8, which is nearly identical to ARMv7.
In the documentation for GNU and Linux (except for Redhat and Fedora), AArch64 is sometimes referred to as ARM64.
The Cortex-A series processors now include implementations in ARMv8-A and ARMv7-A:
  • • Cortex-A5, Cortex-A7, Cortex-A8, Cortex-A9, Cortex-A15, and Cortex-A17 processors are all implemented by the ARMv7-A architecture.
  • • Cortex-A53, Cortex-A57, and Cortex-A73 processors are implemented by the ARMv8-A architecture.
ARMv8 processors still support software written for ARMv7-A processors (with some exceptions). This means that, for example, 32-bit code written for ARMv7 Cortex-A series processors can also run on ARMv8 processors (like Cortex-A57). However, the code will only run when the ARMv8 processor is in AArch32 execution state.

Additionally, the 64-bit instruction set of A64 cannot run on ARMv7 processors and can only run on ARMv8 processors.

The changes from 32 bits to 64 bits

64-bit processors have significant performance improvements, which include the following changes:

1. Larger register pool

The A64 instruction set provides several significant performance advantages, including a larger register pool. A64 has 31 64-bit general-purpose registers, and the ARM Architecture Procedure Call Standard (AAPCS) accelerates performance. When users need to pass more than four parameters (requiring more than four registers) in function calls, they might have to use the stack in ARMv7, while in AArch64, up to eight parameters can be passed in registers, thus increasing performance and reducing stack usage.

2. Wider integer registers

Wider integer registers allow code that operates on 64-bit data to work more efficiently. A 32-bit processor may require multiple operations to perform arithmetic on 64-bit data. A 64-bit processor may be able to perform the same task in a single operation, typically at the same speed as performing 32-bit operations on the same processor. Therefore, code executing many 64-bit size operations runs significantly faster.

3. Larger virtual address space

64-bit operations allow applications to use a larger virtual address space. While Large Physical Address Extension (LPAE) extends the physical address space of 32-bit processors to 40 bits, it does not extend the virtual address space. This means that even with LPAE, a single application is still limited to a 32-bit (4GB) address space because part of this address space is reserved for the operating system.
A larger virtual address space also supports memory mapping of larger files. This is when the contents of a file are mapped into the memory of a thread. This can occur even if the physical RAM may not be large enough to contain the entire file.

32-bit address space

As a 32-bit microprocessor, the maximum addressable space supported by the ARM architecture is 4GB (2^32 bytes). This address space can be viewed as a size of 2^32 bytes (8 bits), where the address unit of these bytes is an unsigned 32-bit number ranging from 0 to 2^32-1. The ARM address space can also be viewed as 2^30 words (1 word = 4 bytes). The addresses of these word units must be divisible by 4, meaning the lower two bits of the address must be 00. The word data at address A includes the contents of the four byte units at addresses A, A+1, A+2, and A+3.
For every instruction executed, the current instruction counter increments by 4 bytes.

4. Larger physical address space

Software running on a 32-bit architecture may need to map some data in memory for input/output at runtime. Having a larger address space (using 64-bit pointers) can avoid this issue.
However, using 64-bit pointers does incur some costs: the same piece of code typically uses more memory when using 64-bit pointers than when using 32-bit pointers.
Each pointer stored in memory requires 8 bytes instead of 4 bytes. This may seem trivial, but it can impose a significant burden. Moreover, the increased memory space usage associated with 64 bits may lead to a decrease in cache hit rates, which in turn reduces performance.
  • • 64-bit pointers: 8 bytes
  • • 32-bit pointers: 4 bytes

ARMv8-A Architecture

The ARM architecture can be traced back to 1985, and it has undergone significant development since the early ARM cores, adding features and functionalities at each step.

ARMv4 and earlier

These early processors only used the ARM 32-bit instruction set.

ARMv4T

The ARMv4T architecture added the Thumb 16-bit instruction set to the ARM 32-bit instruction set. This was the first widely licensed architecture. It was implemented by the ARM7TDMI® and ARM9TDMI® processors.

ARMv5TE

The ARMv5TE architecture added improvements for DSP-type operations, saturated arithmetic, and interworking between ARM and Thumb. This architecture was implemented by the ARM926EJ-S®.

ARMv6

ARMv6 included several enhancements, such as support for unaligned memory access, significant changes to the memory architecture, and support for multiprocessors. It also included some support for SIMD operations on byte or half-word operations in 32-bit registers. This architecture was implemented by the ARM1136JF-S®. The ARMv6 architecture also provided some optional extensions, particularly Thumb-2 and security extensions (TrustZone®). Thumb-2 extended Thumb to a mixed-length 16-bit and 32-bit instruction set.

ARMv7-A

The ARMv7-A architecture mandated the use of the Thumb-2 extension and added advanced SIMD extensions (NEON). Before ARMv7, all cores followed a similar architecture or feature set. To help address the increasing variety of applications, ARM introduced a set of architecture configurations:
  • • ARMv7-A provides all the features necessary for supporting operating systems such as Linux.
  • • ARMv7-R provides predictable real-time high performance.
  • • ARMv7-M is aimed at deeply embedded microcontrollers. The M configuration was added to the ARMv6 architecture to enable features from the older architecture. The ARMv6M configuration is used by low-power, low-cost microprocessors.

ARMv8-A

The ARMv8 architecture includes both 32-bit and 64-bit execution. It introduces the use of 64-bit wide registers while maintaining backward compatibility with existing ARMv7 software.
Overview of ARM v8 Processors: Architecture and Technology

Development of the ARMv8 architecture
The ARMv8-A architecture introduces many changes that allow for the design of higher-performing processor implementations:

Larger physical address

This allows processors to access more than 4GB of physical memory.

64-bit virtual addressing

This allows virtual memory beyond the 4GB limit. This is crucial for modern desktop and server software that use memory-mapped file I/O or sparse addressing.

Automatic event signaling

This enables energy-efficient, high-performance spin locks.

Larger register file

31 64-bit general-purpose registers enhance performance and reduce stack usage.

Efficient 64-bit immediate generation

Reduced demand for text pools.

Larger PC relative addressing range

A +/-4GB addressing range enables efficient data addressing in shared libraries and position-independent executables.

Additional 16KB and 64KB translation granularity

This reduces the miss rate of the Translation Lookaside Buffer (TLB) and page walk depth.

New exception model

This reduces the complexity of operating system and hypervisor software.

Efficient cache management

User-space cache operations enhance the efficiency of dynamic code generation. Use Data Cache Zero Instruction Clear Fast Data Cache (DC).

Hardware-accelerated encryption

Provides a 3× to 10× performance boost for software encryption. This is very useful for small-grained decryption and encryption that are too small to be effectively loaded onto hardware accelerators, such as HTTPS.

Load-Acquire, Store-Release instructions

Designed for C++11, C11, and Java memory models. They improve the performance of thread-safe code by eliminating explicit memory barrier instructions.

NEON double-precision floating-point advanced SIMD

This allows SIMD vectorization to apply to a broader set of algorithms, such as scientific computing, high-performance computing (HPC), and supercomputing.

ARMv8-A Processors: A53, A57, and A73

Overview of ARM v8 Processors: Architecture and Technology

Comparison of A53 and A73
A73:
Overview of ARM v8 Processors: Architecture and Technology

Cortex-A73 processor implementation options
The Cortex-A73 processor is a mid-range, low-power processor with 1 to 4 cores in a single cluster, each with an L1 cache subsystem, an optional integrated GICv3/4 interface, and an optional L2 cache controller. The Cortex-A53 processor is a highly energy-efficient processor capable of supporting both 32-bit and 64-bit code. Its performance is significantly higher than that of the successful Cortex-A7 processor. It can be deployed as a standalone application processor or paired with the Cortex-A57 processor to achieve optimal performance, scalability, and energy efficiency using the big.LITTLE configuration.
Overview of ARM v8 Processors: Architecture and Technology

Cortex-A53 processor
The Cortex-A53 processor has the following features:
  • • In-order execution with an eight-stage pipeline.
  • • Reduced power consumption through the use of hierarchical clock gating, power domains, and advanced sleep modes.
  • • Enhanced dual-issue capability through resource duplication and dual instruction decoders.
  • • Power-optimized L2 cache design provides lower latency and balances performance with efficiency.

Cortex-A57 Processor

The Cortex-A57 processor is aimed at mobile and enterprise computing applications, including compute-intensive 64-bit applications such as high-end computers, tablets, and server products. It can be used with the Cortex-A53 processor in an ARM big.LITTLE configuration to achieve scalable performance and more efficient energy usage.
The Cortex-A57 processor features cache coherency interoperability with other processors (including ARM Mali™ series GPUs for GPU computing) and provides optional reliability and scalability features for high-performance enterprise applications. It offers higher performance than the ARMv7 Cortex-A15 processor with greater energy efficiency. Compared to the previous generation of processors, it includes cryptographic extensions that can improve the performance of cryptographic algorithms by 10 times.
Overview of ARM v8 Processors: Architecture and Technology

Cortex-A57 processor core
The Cortex-A57 processor fully implements the ARMv8-A architecture. It supports multi-core operations with one to four cores in a single cluster. Multiple coherent SMP clusters can be achieved through AMBA5 CHI or AMBA 4 ACE technology. Debugging and tracing can be obtained through CoreSight technology.
The Cortex-A57 processor has the following features:
  • • Out-of-order execution with more than 15 pipeline stages.
  • • Energy-saving features including way prediction, tag reduction, and cache lookup suppression.
  • • Increased peak instruction throughput through resource duplication. Power-optimized instruction decoding with localized decoding and 3-wide decode bandwidth.
  • • Performance-optimized L2 cache design allows multiple cores in the cluster to access L2 simultaneously.

Cortex-A73 Processor

This is the latest A series processor released by ARM in 2016. The Cortex-A73 supports the full-size ARMv8-A architecture, including a 128-bit AMBA 4 ACE interface and ARM’s big.LITTLE system integration interface. It is manufactured using the most advanced 10nm technology and can provide 30% higher sustained processing power than the Cortex-A72, making it very suitable for mobile devices and consumer-grade applications.
Overview of ARM v8 Processors: Architecture and Technology

Cortex-A73
Overview of ARM v8 Processors: Architecture and Technology

Example Cortex-A73 processor configuration

References

[1] ARMv8 Development Reference Document: https://developer.arm.com/documentation/#cf[navigationhierarchiesproducts]=Architectures,CPU%20Architecture,A-Profile,Armv8-A[2] 754-2008 – IEEE Standard for Floating-Point Arithmetic: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4610935[3] 1003.1, 2016 Edition – IEEE Standard for Information Technology—Portable Operating System Interface (POSIX(TM)) Base Specifications, Issue 7: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7582338[4] 1149.1-2001 – IEEE Standard Test Access Port and Boundary Scan Architecture: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=938734[5] ARM® Architecture Reference Manual – ARMv8, for ARMv8-A architecture profile (ARM DDI0487): https://documentation-service.arm.com/static/623b2de33b9f553dde8fd3b0?token=[6] ARM® Cortex®-A Series Programmer’s Guide for ARMv7-A (DEN 0013): https://developer.arm.com/documentation/den0013/latest/[7] ARM® NEON™ Programmer’s Guide (DEN 0018): https://developer.arm.com/documentation/den0018/a/[8] ARM® Cortex®-A53 MPCore Processor Technical Reference Manual (DDI 0500): https://developer.arm.com/documentation/ddi0500/e/BABJBFEJ[9] ARM® Cortex®-A57 MPCore Processor Technical Reference Manual (DDI 0488): https://developer.arm.com/documentation/ddi0488/h/[10] Arm Cortex-A73 MPCore Processor Technical Reference Manual: https://developer.arm.com/documentation/100048/0100/?lang=en[11] ARM® Generic Interrupt Controller Architecture Specification (ARM IHI 0048): https://developer.arm.com/documentation/ihi0048/b/[12] ARM® Compiler armasm Reference Guide v6.01 (DUI 0802): https://developer.arm.com/documentation/dui0802/b/[13] ARM® Compiler Software Development Guide v5.05 (DUI 0471): https://developer.arm.com/documentation/dui0471/latest[14] ARM® C Language Extensions (IHI 0053): https://developer.arm.com/documentation/ihi0053/d/[15] ELF for the ARM® Architecture (ARM IHI 0044): https://www.simplemachines.it/doc/aaelf.pdf
The “Complete Handbook of ARM Series Processor Application Technology” contains 16 chapters (469 pages of valuable PDF), download link: Complete Handbook of ARM Series Processor Application Technology.
Related Downloads:
Collection of CPU and GPU Research Frameworks
1. Industry in-depth report: GPU research framework
2. Research framework for Xinchuang Industry
3. ARM Industry Research Framework
4. CPU research framework
5. Domestic CPU research framework
6. Industry in-depth report: GPU research framework
Summary of ARM CPU Processor Information (1)
Summary of ARM CPU Processor Information (2)
Complete Handbook of ARM Series Processor Application Technology
Open-source applications of ARM architecture servers
ARM architecture servers and storage
Analysis of server hardware architecture
Research on the current status of the server market
Overview of ARM v8 Processors: Architecture and Technology
Disclaimer:This account focuses on sharing related technologies, and the content views do not represent the position of this account. All traceable content is duly noted. If there are any copyright issues with the published articles, please leave a message to contact for deletion, thank you.
Recommended Reading
For more architecture-related technology knowledge summaries, please refer to the “Architect’s Full Store Technology Information Package” related e-books (37 books technology material package summary details can be obtained by “reading the original text“.
All store content is continuously updated, and now order “Architect’s Full Store Material Package Summary (All)” to enjoy “free” updates of all store content later, the price is only 198 yuan (original total price 350 yuan).
Warm Reminder:
Please search for “AI_Architect” or “scan code” to follow the public account to grasp deep technology sharing in real time. Click “reading the original text” to obtain more original technology dry goods.
Overview of ARM v8 Processors: Architecture and Technology
Overview of ARM v8 Processors: Architecture and Technology

Leave a Comment