Discussing ARM's New Cortex X3, A715, and A510 v2

Significant Improvements in the Big Core

Regarding the Cortex X3, I had seen someone mention that it is a product of the Sophia team, a newly developed architecture that is different from X1/X2, and I have always been skeptical about it. It seems to be true now.

When X2 was released, I thought that with such a small change in architecture, how could IPC significantly improve? Even with L3 differentiated competition, a 16% gap shouldn’t appear. Only after various tests came out did I realize I was deceived by ARM; there was no IPC improvement of over 10%.

Now that the Cortex X3 has come out, it looks like a truly new architecture, becoming a real 6-issue processor, catching up to mainstream levels, rather than the small tweaks of X2. The main improvements are more efficient branch prediction, larger buffers, and more execution units.

Has a new indirect branch predictor been made?

Discussing ARM's New Cortex X3, A715, and A510 v2

The front end has changed from 5 decode to 6 decode, and ROB has been upgraded from X2’s 288 to 320.

However, the MOP-Cache capacity is only half, officially advertised as a more efficient MOP-Cache. Compared to the same 1.5K A77, the improvement is significant. It seems that Mops throughput has not been affected, and it is said that the algorithm has been modified.

Two integer ALUs have been added, while the floating-point part seems to have no major changes.

Comparing with the previous Cortex X2 and X1:

Improvements in the memory subsystem:

Integer load bandwidth has increased from 24B/cycle to 32B/cycle, and two additional data prefetchers have been added, improving bandwidth by 25%.

Who set this bad precedent, bringing Geekbench into the IPC calculation?

According to official statements, the Cortex X3 achieves an 11% IPC improvement, but from this graph, it can be seen that this 11% is for SPEC06’s integer, and GB5 looks like about 10%, while SPEC17 is around 8%. Averaging these three gives approximately a 9.7% IPC improvement.

If this GB5 score is true, then we can estimate that if the 8 Gen2 has a big core boost frequency of 3.2GHz, it would score around 1460, a score between A13 and A14 big cores.

One last interesting point is that L2 is adjustable, allowing downstream to choose between 512KB L2 or 1MB L2.

ARM also specifically mentioned the difference between 512KB L2 and 1MB L2, stating that the effect of 1MB L2 is significantly better. I previously thought that 512KB was opened for downstream manufacturers, but a netizen reminded me that Samsung’s Exynos2100, which uses X1, has already been using 512KB L2. However, from the test results, it is significantly weaker than Snapdragon 888’s X1, so did ARM specifically mention this to warn Samsung not to choose 512KB L2 anymore???

In summary, the improvements in Cortex X3 are quite significant, and this data should not be misleading anymore. I see that you have made so many changes; if you still can’t significantly improve, that would be funny.

Regarding power consumption, ARM did not mention it this time, so I guess X3 might still have high power consumption.

Brand New A715 Dropping 32-bit Support:

Officially claiming a 20% reduction in power consumption compared to A710 at the same performance, while improving IPC by 5% under the same power consumption (this is quite intriguing).

They also provided a power performance graph for SPECint_base2006. If this graph is not fabricated, the A715 has significantly lower power consumption performance under mid to high frequencies.

It looks interesting; in fact, the architectural improvements in A715 are greater than the naming improvements. In my opinion, it could have been called A720 without any issues. The front end has increased throughput, upgrading from 4 decode to 5 decode, and the MOP Cache has been removed, with relevant tasks assigned to a more efficient instruction cache.

A710 supported both 32-bit and 64-bit, but A715 has become a purely 64-bit architecture.

Therefore, the decoder has undergone significant changes, improving efficiency, and it seems that dropping 32-bit support for the mid-core can yield some interesting results.

Branch prediction has seen significant improvements.

I would say it has undergone a complete makeover.

The performance of A710 has been somewhat poor, but this A715 seems promising. If ARM’s data is correct and power consumption is reduced by 20%, then it should be quite significant on mobile devices, especially since in the previous energy efficiency graph, A715 shows a significant drop in power consumption in the mid-high performance range.

No Significant Improvements in the Little Core

Still A510

Moreover, the claimed reduction in power consumption is only 5%. To be honest, even if it’s true, this reduction is not very noticeable, so we can consider this new A510 equivalent to the old A510.

However, this new A510 can support 32-bit applications (the previous year’s A510 did not support), meaning that the support for 32-bit applications has been shifted from the mid-core to the little core, which is also acceptable since mid-core power consumption is much higher than little core. Even 32-bit applications running in the background would require the mid-core to work. Shifting this to the little core may result in a more efficient pairing.

DSU stands for DynamIQ Shared Unit and is designed based on the big.LITTLE architecture. This time, the DSU has also been upgraded, allowing support for 12-core CPUs. Currently, the highest DSU can only support 8 cores.

This time supports 12 cores, but from the specific introduction, 12 cores refer to the new 8+4+0 design scheme.

8+4+0 means 8 big cores, 4 mid-cores, and no little cores. This specification is destined to be used only for computers; the corresponding position is also for all-in-one desktops.

For mobile phones, it should still be 1+3+4,

However, whether this 1+4+4 will be used remains to be seen. Both 1+4+4 and 2+2+4 are new arrivals, seemingly designed for larger mobile devices. However, tablets typically use phone SoCs directly, so it is clearly not suitable to produce a separate version for tablets. Let’s see who will choose to do that.

ARM also announced several new members:

Hunter+Chaberton, which should be two mid-cores, but I don’t know if they are new architectures or just minor adjustments. I am not very optimistic.

The next generation little core is Hayes, which should be released next year, so a lot of work must have been done by this time. To be honest, I am more interested to see what level this Hayes can achieve.

Discussing ARM’s New Cortex X3, A715, and A510 v2

Significant Improvements in the Big Core

Brand New A715 Dropping 32-bit Support:

No Significant Improvements in the Little Core

Leave a Comment Cancel reply