In March of this year, ARM announced its first new chip architecture in a decade, ARMv9, claiming it offers better security, higher AI performance, and faster overall performance, which will empower 300 billion chips in the future. Recently, ARM officially unveiled three CPU and three GPU core designs based on ARMv9. The three new CPUs are the flagship core Cortex-X2, the high-performance core Cortex-A710, and the high-efficiency core Cortex-A510, while the three new GPU cores are Mali-G710, Mali-G510, and Mali-G310, covering different performance tiers.

As the most influential IP in the mobile sector, ARM’s new core designs can essentially be seen as representative of the CPU performance of Android models in 2022, as well as the GPU performance of some chip manufacturers. In other words, the specific performance of Cortex-X2, Cortex-A710, and Cortex-A510 will directly determine the future performance of SoCs from Qualcomm, MediaTek, Samsung, and even Huawei.

In fact, as early as 2011, the introduction of the AArch64 architecture in the ARMv8 architecture brought revolutionary improvements to 64-bit ARM processors. The new generation in ARMv9 adds more support for SIMD vector instruction extensions (SVE2) and a trusted computing architecture, which means that the Cortex-X2, Cortex-A710, and Cortex-A510 may not see a leap in upgrades. According to ARM, Cortex-X2 offers a 16% improvement in integer performance compared to the previous generation Cortex-X1, while machine learning performance can double; Cortex-A710 shows a 10% performance improvement and a 30% efficiency improvement over the previous Cortex-A78; and the updated Cortex-A510 core, after four years, offers a 35% performance improvement and a 20% efficiency improvement over the aging Cortex-A55.

Of course, as ARM is known as the “master of wordplay,” it has a history of providing misleading statements in its presentations. For example, it once showcased a 3.0GHz Cortex-A76, which was never actually produced by downstream manufacturers during its lifecycle, or it would compare without using controlled variables. Although ARM claims a 16% improvement in integer performance for Cortex-X2, the presentation clearly indicates that the Cortex X1, which was released last year and could be configured with up to 8MB L3 cache, only has 4MB here. Additionally, the pipeline length of Cortex-X1 has been reduced from 11 cycles to 10, the dispatch stage from 2 cycles to 1, and the ROB (Reorder Buffer) has increased by up to 30%, along with enhancements to the L2 cache’s TLB (Translation Lookaside Buffer) and data prefetching capabilities.

Thus, it is not difficult to see that the performance improvements of Cortex-X2 rely on increased cache, enhanced memory access performance, and optimized inter-process communication latency, following the old path. More importantly, the energy consumption curve released by ARM shows that the performance trend of Cortex-X2 is similar to that of Cortex-X1, where maximum performance release requires higher power consumption to support it. Therefore, it seems that if the SoC using Cortex-X2 does not improve the manufacturing process, it may again become a headache for mobile manufacturers.
After discussing the Cortex-X2, which will determine the performance ceiling of future Android flagship models, let’s take a look at Cortex-A710. After ARM’s CPU architecture shifted to a big.LITTLE tri-cluster architecture, the mid-core began to pursue energy efficiency more aggressively, aiming to maintain stable power consumption while improving performance.

However, it is important to note that the so-called “10% performance improvement and 30% efficiency improvement” presented by ARM will not occur simultaneously. In fact, when Cortex-A710 uses 4MB L2 and 8MB L3 caches, it shows a 10% performance improvement over Cortex-A78 at the same power consumption, or a 30% reduction in power consumption at the same performance. However, in this presentation, the Cortex-X2 + Cortex-A710 combination shares the 8MB L3 cache, leading some industry insiders to believe that the biggest change for Cortex-A710 compared to its predecessor may just be a name change.
If the Cortex-X2 and Cortex-A710 produced by ARM’s Austin team achieve “micro-innovation” through incremental updates, then the Cortex-A510 from ARM’s Cambridge team appears to be more sincere. It is reported that Cortex-A510 uses a new hybrid core microarchitecture that can combine two Cortex-A510 cores into a “core complex,” sharing L2 and FP/NEON pipelines, while separating the floating-point section. Some may find this design familiar, and indeed, there are opinions suggesting that this dual-core shared floating-point unit scheme may have drawn inspiration from AMD’s Bulldozer architecture.

For Cortex-A510, the benefit of this design is that it can better adapt to the characteristics of everyday software that relies on integer performance and games that rely on floating-point performance. In fact, small cores are currently a key focus for major chip manufacturers, as they are the most frequently used in users’ daily tasks. Previously, the small core performance of various Android model chips was generally average, so any slightly heavier tasks would trigger the large core, which has been considered one of the reasons why Qualcomm Snapdragon, MediaTek Dimensity, and Huawei Kirin chips do not manage energy consumption well, while Apple’s A-series chips have strong small core performance; for example, the small core of A14 can already rival Cortex-A76, which is also a significant reason why iPhones can be equipped with smaller capacity batteries.

However, regrettably, even with the more innovative Cortex-A510, ARM continues to use a “Tian Ji’s horse racing” promotional strategy. It claims that the performance of Cortex-A510 is close to that of Cortex-A73 and that it is 30% better than Cortex-A55, but it does not specify that this comparison is based on the Cortex-A510 using 256KB L2 and 8MB L3 against the Cortex-A55 with 128KB L2 and 4MB L3. Moreover, a more fatal point is that in the power consumption curve, Cortex-A510’s performance at mid-low frequencies is even worse than that of Cortex-A55, and its peak power consumption is significantly higher.

In summary, if we only look at the paper data, ARM is still typically squeezing toothpaste this time; apart from Cortex-A510, the performance improvements of Cortex-X2 and Cortex-A710 mainly rely on increased cache. However, as the largest SoC supplier in the Android camp, Qualcomm is known for reducing cache settings when “modifying” ARM’s IP cores, so the actual performance of new SoCs equipped with the ARMv9 instruction set is currently difficult to predict. In fact, the future is also uncertain for domestic app developers. In the official information, ARM specifically mentioned that both Cortex-X2 and Cortex-A510 are limited to the AArch64 microarchitecture, while only Cortex-A710 retains compatibility with AArch32 to cater to Chinese customer needs. This can be seen as ARM objectively limiting hardware to help domestic developers quickly transition to the 64-bit era.

If you now use the appchecker application to check the apps on your phone, you will find that many mainstream applications are still stuck in 32-bit. If no changes are made, next year, mobile manufacturers and chip manufacturers may face significant headaches in SoC scheduling strategies, as the large cores providing performance and the small cores responsible for daily tasks will be incompatible. This means that users will not be able to enjoy the efficiency improvements and performance enhancements brought by Cortex-A510, nor will they be able to leverage the performance of Cortex-X2. Using only Cortex-A710 would undoubtedly lead to poor battery life.

Considering the current Android ecosystem in China, where mobile manufacturers represent a significant hard-core alliance, it is natural that ARM will accelerate the upgrade of 64-bit apps when faced with challenges from mobile manufacturers that control substantial channel resources. Therefore, even if ARM’s current generation of IP cores lacks sincerity, the industry will still hope for the rapid adoption of ARMv9, as advancing the domestic Android ecosystem to 64-bit means that all SoCs based on ARMv8 will achieve better performance release, ultimately benefiting all Android users.
【Images in this article are sourced from the internet】Recommended Reading:
A More Pure Entertainment Partner: iQOO Neo5 Vitality Edition Review
The product performance of iQOO Neo5 Vitality Edition cannot be solely inferred from its product name.
iPod Touch May Make a Comeback, But It Must Change to Adapt to the Times
The newly exposed iPod Touch may undergo the biggest change in its history.