ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Last May, Arm released the first generation of processors based on the 64-bit ARMv9 instruction set: the ultra-large core Cortex-X2, the high-performance large core Cortex-A710, and the high-efficiency small core Cortex-A510. At the same time, Arm also launched three Mali GPU IPs—ARM Mali GPU | G710, G610, G510, G310.

A year later, on June 28, Arm brought a brand new Arm IP combination, including the second generation of ARMv9 CPU cores Cortex-X3 and Arm Cortex-A715, and made significant updates to Cortex-A510 and DSU-110 (DynamIQ Shared Unit), improving the energy efficiency of Cortex-A510, while DSU-110 supports up to 12 cores.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Arm stated that the new Armv9 CPU demonstrates its commitment to unleashing computing performance, aiming to break through the limits of peak performance and provide excellent sustained performance and efficiency.

At the same time, the new Armv9 CPU and the updates to Arm Cortex-A510 and DSU-110 form the basis of Arm’s new Total Compute Solutions (TCS22).

Arm’s Total Compute strategy is rooted in developer accessibility, security, and computing performance, aiming to provide excellent performance for all consumer device markets.Through optimized system design and implementation, it helps partners continuously push boundaries.

Meanwhile, Arm also launched the brand new flagship GPU product Arm Immortalis. This is the first GPU to support hardware-based ray tracing on mobile devices, providing a more realistic immersive gaming experience.

Cortex-X3: Performance Boost of Up to 34%

As Arm’s third-generation Cortex-X series CPU IP aimed at the ultra-high-performance market, Cortex-X3 is also designed for flagship smartphones/tablets or laptops. Compared to the previous generation Cortex-X2, Cortex-X3 brings a double-digit (approximately 11% improvement at the same process node) IPC performance increase.

Specifically, if applied to flagship Android smartphones/tablets, Cortex-X3 can bring a 25% performance increase compared to the latest flagship devices; if applied to Windows on Arm laptops, Cortex-X3 can bring a 34% performance increase compared to the latest mainstream devices.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Although Arm did not introduce the specific power consumption performance of Cortex-X3, according to a chart provided by Arm, under the SPECint_base2006 test, at the same performance level, Cortex-X3’s power consumption is generally higher than that of Cortex-X2. Although at peak performance, Cortex-X3’s power consumption is higher, the performance increase is greater than the power consumption increase. This also means that Cortex-X3’s energy efficiency is better than that of Cortex-X2.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

According to data disclosed by Arm to the media, the peak performance of the Cortex-X3 core (simulated at a clock frequency equivalent to 3.6GHz, with 1MB L2 and 16MB L3 cache) is 34% higher than that of this year’s Intel mid-range Core i7-1260P processor’s P-core (performance core). This data is based on the SPECRate2017_int_base single-thread benchmark test. However, there still seems to be a gap compared to Apple’s M-series high-performance cores.

Compared to previous products, the significant improvement in IPC performance of the Cortex-X3 core is mainly due to extensive optimization work on its core front end, such as improved branch prediction accuracy and lower latency, thanks to a new dedicated structure for indirect branches (branches with pointers). At the same time, the L1/L2 branch target buffer (BTB) has increased significantly by 50%, and the L0 BTB capacity has reached 10 times that of the original, allowing the predictor to fetch more instructions in advance to utilize the larger BTB.

In addition, Cortex-X3 has a micro-operation (instruction decode) cache that is 50% smaller than Cortex-X2 (with the same 1.5K entries as X1), but more efficient, thanks to improved jitter-reducing fill algorithms. This smaller mop cache also allows Arm to reduce the total pipeline depth from 10 cycles to 9 cycles, thus reducing the penalties that occur during branch misprediction and pipeline flushes.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

However, to compete with Intel in the laptop market, Arm chip design companies need to integrate more Cortex-X3 cores and other efficiency cores. For example, Intel’s new 28-watt processor aimed at thin and light laptops has four performance cores and eight efficiency cores.

In this regard, Arm has also upgraded the previously launched DynamIQ Shared Unit DSU-110, allowing Arm chip design companies to integrate up to 12 Cortex-X3 cores or other cores into a processor (previously supporting a maximum of 8 cores) and support up to 16MB of L3 cache. It also has the latest ISA features.

Arm’s Senior Director of Product Management, Saurabh Pradhan, stated, “These changes enhance our partners’ flexibility and provide the resources to fully unleash the potential of Arm CPUs, thereby improving the user experience. Our partners can now unlock a new generation of consumer devices configured with high-end laptops featuring 8 Cortex-X3 CPU cores and 4 Cortex-A715 CPU cores.”

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Cortex-A715: Balancing High Performance and High Efficiency

Cortex-A715 is the successor to Arm’s high-performance large core Cortex-A710 launched last year, mainly aimed at mobile devices that need to balance high performance and efficiency. It is worth noting that Cortex-A715 only supports AArch64 64-bit instructions and is no longer compatible with 32-bit, while the previous Cortex-A710 retained support for 32-bit compatibility.

In terms of specific performance and efficiency, Arm stated that under the same frequency and manufacturing process, the performance of Cortex-A715 is 5% higher than that of Cortex-A710. Although this performance improvement is much lower than the previous Cortex-A710’s performance improvement over Cortex-A78 (which was 10%), it is encouraging that the efficiency of Cortex-A715 is 20% better than that of Cortex-A710, meaning that while maintaining performance improvements, the power consumption can be significantly reduced, extending the device’s battery life.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Arm stated that the significant improvement in energy efficiency of Cortex-A715 will propel it to become the main CPU core cluster in big.LITTLE CPU clusters.

New Version of Cortex-A510: Further Reduced Power Consumption

In addition to the new CPU cores, Arm has also updated the Cortex-A510, the small core CPU launched last year, which is primarily designed for high efficiency and low power consumption. Arm has improved the energy efficiency of the 2021 version of Cortex-A510 while maintaining its performance, reducing its power consumption by an additional 5%.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Arm stated that they have pushed the ultimate efficiency of the small core CPU to a new height, and lower power consumption means that terminal devices can achieve longer battery life.

Second Generation Armv9’s Security Evolution

With the second generation of Armv9 CPU, Arm introduces a new asymmetric memory tagging extension (MTE) and enhanced privileged access never (EPAN) to improve access control.

MTE detection can prevent memory security vulnerabilities across the entire system, providing application developers with a time-to-market advantage. Devices supporting MTE can quickly and effectively identify buffer overflows and heap corruption in the code.

Asymmetric MTE offers greater flexibility between speed, accuracy, and targeting of these security vulnerabilities. This benefits software development and stabilizes applications while allowing MTE to be more widely deployed across the ecosystem.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

New Flagship GPU Series: Immortalis-G715, Supports Ray Tracing

Arm has been updating its Mali GPU series every year, which is currently the largest GPU shipment globally, reaching 8 billion units. However, this year, Arm unexpectedly launched a new flagship GPU series called “Immortalis,” aimed at providing the highest performance and best graphics performance for flagship smartphones, offering an unparalleled gaming experience.

As the first product of the “Immortalis” series—Immortalis-G715, it brings numerous improvements and new features compared to the previous Mali-G715 GPU. These include variable rate shading graphics capabilities for significant energy savings and further enhanced gaming performance, as well as an improved execution engine, with hardware-level support for ray tracing, supporting combinations of more than 10 cores. Mali-G715 only supports 7-9 cores, while Mali-G615 supports a maximum of 6 cores.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Specifically, variable rate shading is a new graphics feature that provides significant energy savings and performance improvements in graphics and visual effects by optimizing rendering. Essentially, it takes a scene and focuses rendering on the parts that need it, rendering at fine pixel granularity. Typically, this will be where the action in the game occurs. Areas that require less focus (such as background scenery) are rendered at a coarser pixel granularity. As shown in the figure below, the game scene will still maintain its perceived visual quality while saving energy. When enabling variable rate shading on game content, we see a 40% increase in frames per second (FPS).

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

In terms of the execution engine, Arm redesigned key elements of the execution engine to improve computing power and energy efficiency. Compared to the previous generation Mali GPU, we redesigned the conversion block of Immortalis-G715 to significantly reduce area. Arm also re-evaluated the fused multiply-add (FMA) that was readjusted in Mali-G710 to further improve power efficiency. In addition, Arm doubled the FMA module to enhance power to provide the computing power needed for higher-level visual effects. Finally, Arm added support for matrix multiplication instructions, which are crucial for mobile use cases such as computational photography and image enhancement, to help achieve a twofold improvement in architecture ML.

Through all these changes, we can cleverly improve power, doubling FMA power while only increasing area by 27%. Essentially, Arm has doubled computing power while only modestly increasing silicon area.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

In addition to the execution engine, Arm has also made PPA (performance, power, and area) improvements in other areas of the new GPU.

The command stream front end (a feature launched last year with Mali-G710) has become faster. This is achieved by adding hardware-based cross-stream synchronization, more native commands, and increasing the number of scoreboards. The peak triangle throughput has doubled. Arm optimized explicit LOD (level of detail) lookups in the texture mapper to double throughput and added a coordinate preprocessor unit to improve the efficiency of cube map lookups. Finally, Arm added Arm Fixed Rate Compression (AFRC) technology (first introduced in last year’s mainstream Arm Mali-G510 GPU) to our new GPU to save bandwidth.

In terms of specific performance, Immortalis-G715 has improved microarchitecture performance by 15% compared to the previous generation high-end and mid-range Mali GPUs. In addition, Immortalis-G715 will deliver a twofold improvement in machine learning performance and a 15% improvement in energy efficiency.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Arm stated that the energy efficiency improvement of the Immortalis-G715 GPU is built on the high energy efficiency foundation of the Arm Mali-G710 GPU. According to Arm’s published data, the Mali-G710 offers excellent GPU efficiency on flagship and high-end Android smartphones, outperforming competitors in terms of FPS/W across various content, including high-end AAA games, benchmark tests, and light workloads, both at peak and sustained workloads.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

In recent years, with the continuous growth of the mobile gaming market, more complex and immersive AAA gaming experiences are now increasingly common on mobile devices. Leading AAA PC and console games have mobile versions, including Genshin Impact, PUBG, Fortnite, Call of Duty, and Honor of Kings, among others. Additionally, a new generation of users is increasingly choosing mobile devices as their preferred gaming platform, primarily due to the convenience and functionality of gaming on mobile devices.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Last year, the Mali-G710 GPU launched by Arm already supported ray tracing effects based on software implementation. MediaTek has also utilized this feature in its flagship Dimensity 9000, introducing ray tracing technology to mobile devices through a mobile ray tracing SDK. However, this was achieved through software, which not only incurs significant power consumption but also offers relatively limited improvements in ray tracing experience.

In contrast, the Immortalis-G715 GPU launched by Arm directly adds hardware support for ray tracing technology, significantly enhancing the gaming experience while further controlling power consumption.

According to Arm’s published data, ray tracing on the Immortalis-G715 uses only 4% of the shader core area, achieving a performance improvement of over 300% through hardware acceleration.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Arm believes that ray tracing represents a paradigm shift in mobile gaming content. Therefore, it has decided to introduce hardware-based ray tracing support on the Immortalis-G715, as partners are already prepared, hardware is ready, and the developer ecosystem is already or soon will be ready.

Arm stated that when the Immortalis-G715 appears in flagship smartphones in early 2023, it will be the foundation for the ecosystem to begin exploring its ray tracing technology for gaming content. As technology continues to evolve in the coming years, this will help prepare for a comprehensive transition to ray tracing for games running on mobile devices.

Arm Total Compute Solutions (TCS22)

Building on the launch of the above CPU IPs, Arm announced the 2022 Total Compute Solutions (TCS22), which utilizes the above IP combination to achieve further collaborative computing between CPU cores and with GPUs, providing different levels of performance, efficiency, and scalability to enhance the user experience across various terminal markets.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

As part of TCS22, Cortex-X3, Cortex-A715, Cortex-A510 CPU cores, as well as Mali GPU and Immortalis GPU can be paired together to meet different terminal demands.

Arm plans to offer a range of “dedicated” chip design configurations to customers through its TCS22 program, combining various technologies, including its expanding CPU and GPU design combinations.

It is reported that the Arm IP combination of TCS22 can achieve a 28% performance improvement across a range of workloads while reducing energy consumption by 16%.

ARM CPU Cortex-X3, Cortex-A715, Cortex-A510 and GPU Immortalis-G715 Overview

Previous Recommendations

SoC Design: Bus Interface

SoC Design: Low Power

Application of Perl in IC

EDA Tutorials

Chip Design

UVM Chip Verification

FPGA Series

Vivado Advanced

Formal Verification

Digital C Written Exam

Digital IC Interview Experiences

Outstanding IC/FPGA Open Source Projects

Integration and Timing Analysis

Course Recommendations

Disclaimer: Reproduction without authorization is prohibited

Leave a Comment