Is ARM Cortex-X1 Architecture Really That Powerful?

Is ARM Cortex-X1 Architecture Really That Powerful?

Click the top Computer Enthusiasts to follow us

With the release of the Kirin 9000 and Samsung Exynos 1080, the Android smartphone chip sector has officially entered the 5nm era. Unfortunately, the CPU architecture of the Kirin 9000 still lingers at the ARM Cortex-A77 stage released last year, while the Exynos 1080, although equipped with the latest ARM Cortex-A78, did not introduce the most powerful Cortex-X1 due to its positioning.
Is ARM Cortex-X1 Architecture Really That Powerful?
If all goes well, the new generation of 5nm flagship SoCs such as Snapdragon 875, Exynos 2100, and Dimensity 2000 will adopt a combination of Cortex-X1 super core + Cortex-A78 big core + Cortex-A55 small core. In other words, in 2021, only SoCs equipped with the Cortex-X1 core will be eligible to be called “super flagship”.
The reason Cortex-X1 has such great appeal is because it has another nickname – the terminator of self-research.
The Charm of Self-Research
In professional testing software like GeekBench, Apple’s iPhone series phones consistently achieve top rankings against contemporaneous flagship Android phones. The reason is that Apple has been on a “self-research” journey for its A-series processors since the iPhone 5 era, leveraging its strong R&D capabilities and financial advantages, making the ARM native Cortex-A series cores virtually no match. The emergence of ARM’s new Cortex-X1 core IP has given the Android camp a glimmer of hope to challenge the A-series processors.
Is ARM Cortex-X1 Architecture Really That Powerful?
所谓的“自研”,就是购买ARM最高级的指令集授权,然后根据自身需要开发兼容ARM的架构,但能领先ARM公版的Cortex-A架构多少全看芯片商的技术水平。
Self-Research Encounters Obstacles, Modifications Dominate
Besides Apple, Qualcomm once adopted self-research Krait architecture during the Snapdragon 600/800 era, and the most recent Snapdragon 820 core also belongs to the self-research Kyro. However, Qualcomm found that the energy efficiency of self-researched architectures was difficult to surpass that of contemporaneous public version Cortex-A7x architectures, making it economically unfeasible. Therefore, starting from Snapdragon 835, they adopted the BoC strategy, which is commonly known as “modifications,” customizing and optimizing based on existing public version Cortex-A architectures.
Is ARM Cortex-X1 Architecture Really That Powerful?
Huawei also adopted a similar approach starting from the Kirin 980, with its big core based on the Cortex-A architecture, which is also a form of modification. It is important to note that there are not many places where the public version Cortex-A architecture can be modified, and manufacturers generally focus on the cache part, so whether it is Qualcomm or Kirin, the performance difference between their modified cores and the public version architecture is not significant, with the key factor being the main frequency.
Samsung joined the self-research army starting from Exynos 8890 and launched a core architecture named Mongoose.
Is ARM Cortex-X1 Architecture Really That Powerful?
However, after four generations of independent research and development, Samsung decided to abandon the self-researched Mongoose core at the end of 2019 and disbanded its entire R&D team based in Austin, Texas, opting to fully utilize ARM’s public version solutions in the future. It is evident that, aside from Apple, the self-research paths of other chip manufacturers are fraught with difficulties and often ungrateful.
The good news is that ARM’s recently released “three swordsmen” including the Cortex-X1 is actually an IP core that allows chip manufacturers to conduct extensive customization, completely replacing the arduous path of “self-research”.
Super Core Arrives to Alleviate Problems
If the positioning of Cortex-A78 is the “big core,” then Cortex-X1 is equivalent to the “super big core.” It has the same architectural design as Cortex-A78 but has been expanded in nearly every aspect.
For example, the decoding bandwidth of Cortex-X1 increases from 4 paths of Cortex-A78 to 5 paths, a 25% increase; NEON floating point performance increases from 2 paths of 128b to 4 paths of 128b, effectively doubling floating point performance; in terms of cache, Cortex-X1 has 64KB of L1 cache, 1MB of L2 cache, and the L3 cache can reach up to 8MB, which is twice that of Cortex-A78.
Is ARM Cortex-X1 Architecture Really That Powerful?
According to ARM’s official information, at the same main frequency (that is, IPC performance), the peak performance of Cortex-X1 is 30% higher than that of Cortex-A77, and has a 22% performance advantage over the latest Cortex-A78; its machine learning capability is twice that of both Cortex-A77 and A78.
Is ARM Cortex-X1 Architecture Really That Powerful?
According to the plan, chip manufacturers can choose a combination of 1 Cortex-X1 “super core” + 3 Cortex-A78 “big cores” + 4 Cortex-A55 “small cores” to form a tri-cluster DynamIQ cluster, achieving a perfect balance between performance and power consumption.
The only pity is that the Cortex-X1 core will occupy a larger package area.
In terms of theoretical performance, when 4 Cortex-A78 cores are paired with 4MB L3 cache, their performance can be improved by 20% compared to the previous generation Cortex-A77, while the core area is reduced by 15%; whereas a combination of 1 Cortex-X1 + 3 Cortex-A78 cores with 8MB L3 cache, although the core area will increase by 15%, the peak performance will improve by 30%.
Is ARM Cortex-X1 Architecture Really That Powerful?
At this point, you should understand why ARM modified the L1 instruction cache of Cortex-A78 to optional 32KB or 64KB. By choosing a 32KB L1 instruction cache, the saved packaging space can be allocated to Cortex-X1, whose powerful performance is sufficient to compensate for the partial loss of Cortex-A78 cache. As for why Cortex-X1’s IPC performance is 22% higher than Cortex-A78, but the 1+3+4 combination only shows a 10% performance increase over the 4+4 configuration, it is because Cortex-X1 has only one core in action, with the performance averaged out by the other three Cortex-A78 cores.
Same Super Core, Different Performance
It is important to note that ARM defines Cortex-X1 as a “customizable” mobile platform, allowing chip manufacturers to make requests to ARM based on their budget and needs, and ARM will then adjust the specifications of various modules of Cortex-X1 according to different application scenarios.
Therefore, even if Qualcomm Snapdragon, Samsung Exynos, and MediaTek Dimensity’s next-generation super flagship SoCs all use the Cortex-X1 core, they will still exhibit different performance due to differences in details such as the number of instructions decoded per clock cycle, L0-BTB capacity, macro operation cache capacity, out-of-order window size, number of instructions issued per clock cycle, and cache capacity.
Is ARM Cortex-X1 Architecture Really That Powerful?
In other words, if you are willing to spend more money, or can optimize the SoC structure to solve the high temperature and power consumption issues of Cortex-X1 under full load, the chip performance will definitely be stronger; while the performance of chips that merely hype the Cortex-X1 without deeply exploring its potential will be weaker.
The Cortex-X1, which is said to be 30% stronger than Cortex-A77, belongs to the “full-blooded version.” Next, we will wait to see which of Qualcomm, Samsung, and MediaTek can be the first to utilize the full-blooded version of the Cortex-X1 core.

Is ARM Cortex-X1 Architecture Really That Powerful?

Click “Read Original” for more exciting content

Leave a Comment