Exploring the Components of Arm’s Most Powerful Super Large Core Processor: What’s Inside the Arm Core?

Table of Contents

  • L1 Instruction Memory System

  • Instruction Decode

  • Register Rename

  • Instruction Issue

  • Integer Execute

  • Vector Execute

  • Advanced SIMD and Floating-Point Support

  • Cryptographic Extension

  • Scalable Vector Extension

  • L1 Data Memory System

  • Memory Management Unit

  • L2 Memory System

  • Embedded Trace Extension and Trace Buffer Extension

  • Statistical Profiling Extension

  • Performance Monitoring Unit

  • Activity Monitoring Unit

  • GIC CPU Interface

  • CPU Bridge

Exploring the Components of Arm's Most Powerful Super Large Core Processor: What’s Inside the Arm Core?

The components in the Cortex-X925 core are designed to make it a high-performance core.

Main Modules Include:

  • L1 Instruction and L1 Data Storage Systems

  • L2 Storage System

  • Register Rename

  • Instruction Decode

  • Instruction Issue

  • Execution Pipeline

  • Memory Management Unit (MMU)

  • Trace Unit and Trace Buffer

  • Performance Monitoring Unit (PMU)

  • Activity Monitoring Unit (AMU)

  • Generic Interrupt Controller (GIC) CPU Interface

The Cortex-X925 core connects to the DynamIQ Shared Unit-120 via a CPU bridge.

The Cortex-X925 core implements the ARMv9.2-A architecture, which extends the architecture defined in ARMv8-A, covering ARMv8.7-A.

Exploring the Components of Arm's Most Powerful Super Large Core Processor: What’s Inside the Arm Core?

L1 Instruction Memory System

The L1 instruction memory system fetches instructions from the instruction cache and delivers the instruction stream to the instruction decode unit.

L1 Instruction Memory System Includes:

  • A 64KB, 4-way set associative L1 instruction cache with 64-byte cache lines.

  • A fully associative L1 instruction Translation Lookaside Buffer (TLB) that natively supports 4KB, 16KB, 64KB, and 2MB page sizes.

  • A dynamic branch predictor.

Instruction Decode

The instruction decode unit decodes AArch64 instructions into an internal format.

Register Rename

The register rename unit performs register renaming to facilitate out-of-order execution and dispatches decoded instructions to various issue queues.

Instruction Issue

The instruction issue unit controls when decoded instructions are dispatched to the execution pipeline. It includes an issue queue that stores instructions waiting to be issued to the execution pipeline.

Integer Execute

The integer execution pipeline is part of the overall execution pipeline and contains integer execution units that perform arithmetic and logical data processing operations.

Vector Execute

The vector execution unit is part of the execution pipeline, executing advanced SIMD and floating-point operations (FPU), executing Scalable Vector Extension (SVE) and SVE2 instructions, and optionally executing cryptographic instructions.

Advanced SIMD and Floating-Point Support

Advanced SIMD is a media and signal processing architecture that primarily adds instructions for audio, video, 3D graphics, image, and speech processing. The floating-point architecture provides support for single-precision and double-precision floating-point operations.

Cryptographic Extension

The cryptographic extension in the Cortex-X925 core is optional. The cryptographic extension adds new instructions to the advanced SIMD and SVE instruction sets to accelerate the following operations:

  • Advanced Encryption Standard (AES) encryption and decryption.

  • Secure Hash Algorithm (SHA) functionalities SHA1, SHA2, SHA3.

    • Even without CRYPTO support configured, SVE2 versions of SHA3 instructions EOR3, XOR, and BCAX are supported.

  • Armv8.2-SM SM3 hash function and SM4 encryption and decryption instructions.

  • Finite field operations used in algorithms such as Galois/Counter Mode and Elliptic Curve Cryptography.

The cryptographic extension in the Cortex-X925 core is optional. The cryptographic extension adds new instructions to the advanced SIMD and SVE instruction sets to accelerate the following operations:

  • Advanced Encryption Standard (AES) encryption and decryption.

  • Secure Hash Algorithm (SHA) functionalities SHA1, SHA2, SHA3.

    • Even without CRYPTO support configured, SVE2 versions of SHA3 instructions EOR3, XOR, and BCAX are supported.

  • Armv8.2-SM SM3 hash function and SM4 encryption and decryption instructions.

  • Finite field operations used in algorithms such as Galois/Counter Mode and Elliptic Curve Cryptography.

Note

The optional cryptographic extension is not included in the base product. Arm provides the cryptographic extension under additional licensing available with the Cortex-X925 core.

Scalable Vector Extension

The Scalable Vector Extension (SVE) and Scalable Vector Extension 2 (SVE2) are extensions of the Armv8-A architecture.

They complement but do not replace AArch64 advanced SIMD and floating-point capabilities. The advanced SIMD architecture, its related implementations, and supporting software are also referred to as NEON technology.

L1 Data Memory System

The L1 data memory system executes load and store instructions and covers the L1 data side storage system. It also handles memory consistency requests.

The L1 data memory system includes:

  • A 64KB, 4-way set associative cache with 64-byte cache lines.

  • A fully associative L1 data TLB that natively supports 4KB, 16KB, and 64KB page sizes, as well as 2MB and 512MB block sizes.

Memory Management Unit

The Memory Management Unit (MMU) provides fine control over the memory system through a set of virtual-to-physical address mappings and storage attributes (stored in translation tables).

When addresses are translated, these are saved in the TLB. TLB entries include global and address space identifiers (ASID) to prevent TLB invalidation during context switches. They also include virtual machine identifiers (VMID) to prevent TLB invalidation caused by hypervisor during virtual machine switches.

L2 Memory System

The L2 memory system includes the L2 cache. The L2 cache is a private cache of the core and can be configured as an 8-way set associative cache of 2MB or a 12-way set associative cache of 3MB. The L2 memory system connects to the DSU-120 via an asynchronous CPU bridge.

Embedded Trace Extension and Trace Buffer Extension

The Cortex-X925 core supports a range of debugging, testing, and tracing options, including trace units and trace buffers.

The Cortex-X925 core also includes a ROM table that contains a list of components in the system. Debuggers can use this ROM table to determine which CoreSight components are implemented.

All debugging and tracing components of the Cortex-X925 core are described in this manual. For more information on the Embedded Logic Analyzer (ELA), refer to the Arm CoreSight ELA-600 Embedded Logic Analyzer Technical Reference Manual.

Statistical Profiling Extension

The Cortex-X925 core implements the statistical profiling extension (SPE) of the ARMv8.7-A architecture. SPE provides a statistical view of the performance characteristics of executed instructions, which software developers can use to optimize code for better performance.

Performance Monitoring Unit

The Performance Monitoring Unit (PMU) provides 6 or 31 performance monitors based on configuration. Performance monitors can be configured to collect operational statistics for each core and the memory system. This information can be used for debugging and code analysis.

Activity Monitoring Unit

The Cortex-X925 core implements the activity monitoring extension of the ARMv8.4-A architecture. The activity monitoring unit (AMU) provides useful information about system power management and continuous monitoring.

GIC CPU Interface

The Generic Interrupt Controller (GIC) CPU interface integrates with external distributor components to support and manage interrupts in cluster systems.

CPU Bridge

In a cluster, there is a CPU bridge between each Cortex-X925 core and the DSU-120.

The CPU bridge controls buffering and synchronization between the core and the DSU-120.

The CPU bridge is asynchronous, allowing each core to have different choices in frequency, power, and area implementation points. You can configure the CPU bridge to run synchronously without affecting other always-asynchronous interfaces, such as debugging and tracing.

Recommended Courses “From Beginner to Mastering Armv8/Armv9 Architecture” (Three Sessions)
“Trustzone/TEE/Security from Beginner to Master” Standard Edition
Arm Selected – Platinum VIP Courses
Consult via WeChat: sami01_2023

Leave a Comment