The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?

The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?

Click the above Computer Enthusiasts to follow us

With the release of the Kirin 9000 and Samsung Exynos 1080, the Android smartphone chip sector has officially entered the 5nm era. Unfortunately, the CPU architecture of the Kirin 9000 is still stuck at the Cortex-A77 stage released by ARM last year, while the Exynos 1080, although utilizing ARM’s latest Cortex-A78, has not introduced the strongest Cortex-X1 released at the same time due to its positioning.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
If everything goes well, the new generation of flagship SoCs such as Snapdragon 875, Exynos 2100, and Dimensity 2000 will adopt the combination of Cortex-X1 super core + Cortex-A78 big core + Cortex-A55 small core. In other words, in 2021, only SoCs equipped with the Cortex-X1 core will qualify as “super flagships”.
The reason Cortex-X1 has such a strong appeal is that it has another nickname – the end of self-research.
The Charm of Self-Research
In professional testing software like GeekBench, Apple’s iPhone series phones can always achieve a “slaughtering” of flagship Android phones of the same period. The reason is that Apple has been on the “self-research” journey of the A series processors since the iPhone 5 era, relying on strong R&D strength and financial advantages, the ARM native Cortex-A series cores are simply not competitors. The emergence of ARM’s new Cortex-X1 core IP gives the Android camp a glimmer of hope to challenge the A series processors of the same period.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
Self-research means purchasing the highest level of ARM instruction set authorization, and then developing an ARM-compatible architecture based on one’s own needs, but how much it can lead the ARM public version Cortex-A architecture depends on the technical level of the chip manufacturer.
Self-Research Encountering Obstacles, Modification Prevails
Aside from Apple, Qualcomm adopted its self-researched Krait architecture during the Snapdragon 600/800 era, and the core of the recent Snapdragon 820 also belongs to the self-researched Kyro. However, Qualcomm found that the energy efficiency of self-researched architectures is hard to lead the public version Cortex-A7x architecture, which is very uneconomical. Therefore, starting from Snapdragon 835, it adopted the BoC strategy, which is commonly known as “modification”, customizing optimizations based on the existing public version Cortex-A architecture.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
Huawei has also adopted a similar approach since the Kirin 980, with its big core based on the Cortex-A architecture being “based”, which is also a form of modification. It is important to note that the areas where the public version Cortex-A architecture can be “modified” are not many, and everyone basically cuts the cache part, so whether it is Qualcomm or Kirin, the performance difference between their modified cores and the public version architecture is not large, the key still lies in the main frequency.
Samsung also joined the self-research army starting from Exynos 8890 and launched a core architecture named Mongoose.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
However, after four generations of independent research and development, Samsung decided to abandon the self-researched Mongoose core at the end of 2019 and disbanded the entire R&D team based in Austin, Texas. In the future, it will fully use ARM’s public version solutions. It can be seen that aside from Apple, the self-research path of other chip manufacturers is fraught with thorns and is a thankless task.
The good news is that the Cortex-X1, one of the “three swordsmen” released by ARM, is actually a highly customizable IP core that allows chip manufacturers to avoid the arduous path of “self-research”.
Super Core Arrives, Solving Problems
If the positioning of Cortex-A78 is “big core”, then Cortex-X1 is equivalent to “super big core”. Its architectural design is almost identical to that of Cortex-A78, but it has been extended in almost every aspect.
For example, the decoding bandwidth of Cortex-X1 is upgraded from 4 paths of Cortex-A78 to 5 paths, an increase of 25%; NEON floating point has been upgraded from 2 x 128b to 4 x 128b, which means the floating point performance has doubled; in terms of cache, the L1 cache of Cortex-X1 is 64KB, L2 cache is 1MB, and the L3 cache can reach up to 8MB, which is twice that of Cortex-A78.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
According to ARM’s official data, at the same frequency (i.e., IPC performance), the peak performance of Cortex-X1 is 30% higher than that of Cortex-A77, and has a 22% performance advantage over the latest Cortex-A78, while its machine learning capability is twice that of Cortex-A77 and A78.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
In the future, chip manufacturers can choose a configuration of 1 Cortex-X1 “super core” + 3 Cortex-A78 “big cores” + 4 Cortex-A55 “small cores” to form a tri-cluster DynamIQ cluster to achieve a perfect balance between performance and power consumption.
The only regret is that the Cortex-X1 core will occupy a larger package area.
In terms of theoretical performance, 4 Cortex-A78 cores with 4MB L3 cache can achieve a 20% performance improvement over the previous generation Cortex-A77, while the core area is reduced by 15%; on the other hand, a combination of 1 Cortex-X1 + 3 Cortex-A78 cores with 8MB L3 cache will increase the core area by 15%, but peak performance will increase by 30%.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
At this point, everyone should understand why ARM changed the L1 instruction cache of Cortex-A78 to a selectable 32KB or 64KB, right? Choosing a 32KB L1 instruction cache can save packaging space for Cortex-X1, and the powerful performance of the latter can compensate for the loss of some cache in Cortex-A78. As for why the IPC performance of Cortex-X1 is 22% higher than that of Cortex-A78, but the combination of 1+3+4 only improves performance by 10% compared to 4+4, it is because Cortex-X1 is only one chip facing the battle, and its performance is averaged out by the other three Cortex-A78 cores.
Same Super Core, Different Performance
It should be noted that ARM defines Cortex-X1 as a “customizable” mobile platform, and chip manufacturers can propose requirements to ARM based on their budget and needs, and ARM will then adjust the specifications and designs of the various modules of Cortex-X1 according to different application scenarios.
Therefore, even if Qualcomm Snapdragon, Samsung Exynos, and MediaTek Dimensity’s next-generation super flagship SoCs all use the Cortex-X1 core, there will still be differences in performance due to variations in details such as instruction decoding per clock cycle, L0-BTB capacity, macro operation cache capacity, out-of-order window count, instruction dispatch per clock cycle, and cache capacity.
The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?
In other words, chips that are willing to spend money or can optimize the SoC structure to solve the high temperature and power consumption issues of Cortex-X1 will definitely have stronger performance, while chips that only play the Cortex-X1 gimmick without deep exploration will perform worse.
The aforementioned Cortex-X1, which is 30% stronger than Cortex-A77, belongs to the “full-blooded version”. Next, we will wait to see which of Qualcomm, Samsung, and MediaTek can first use the full-blooded version of the Cortex-X1 core.

The End of Self-Research? How Powerful is the ARM Cortex-X1 Architecture?

Click “Read the Original” for more exciting content

Leave a Comment