For mobile processors, one should not focus too much on the number of cores and clock speeds, but rather on the architecture and process technology.
It is relatively easy to judge the quality of the process technology; basically, a smaller node size indicates a more advanced technology, and FinFET transistors are superior to traditional 2D transistors. However, judging the quality of architecture is much more challenging, as there is a distinction between standard and self-developed architectures. Once the naming conventions of standard architectures are understood, one can get a general idea, but due to the differences among manufacturers, it is very difficult to assess self-developed architectures just by their names. These obstacles ultimately hinder our understanding of the quality of a SoC. Is self-developed architecture stronger than standard architecture? How do different self-developed architectures perform against each other?
To address these questions, let’s start with instruction sets.
The Battle Between RISC and CISC
The hard programs used internally by the CPU to guide operations and optimizations are called “instruction sets”. These are the lowest-level instructions that the CPU can directly recognize and are divided into two categories: Complex Instruction Set Computing (CISC) and Reduced Instruction Set Computing (RISC). CISC increases execution speed by implementing commonly used functions that were originally performed by software with complex hardware instructions. Intel’s well-known x86 architecture is a typical product of CISC. In the early days of computing, when components were expensive, clock speeds were low, and operating speeds were slow, this significantly improved processing efficiency. However, as CISC became increasingly complex, its structure grew larger, and its versatility and speed began to decline, leading to the emergence of RISC, which is driven by a different approach.
The RISC approach simplifies the functions of computer instructions, reduces the average execution cycle of instructions, and implements complex functions using subroutines, thereby increasing the working frequency of the computer while heavily utilizing general-purpose registers to speed up subroutine execution. ARM’s architecture and Imagination Technologies’ MIPS architecture both belong to this category.
Currently, almost all popular mobile processors use ARM architecture, which has four major advantages: 1) small size, low power consumption, low cost, and strong performance; 2) extensive use of registers with most data operations performed in registers, resulting in faster instruction execution; 3) flexible and simple addressing modes, leading to high execution efficiency; 4) fixed instruction length, which can improve processing efficiency through multiple pipelines.
ARM architecture also includes generations such as ARMv6, ARMv7, and ARMv8. The ARM11 core, designed based on the ARMv6 instruction set, was widely used in early smartphones, especially in Nokia’s Symbian system phones. ARMv7 is the most commonly used architecture in the new smartphone era, with well-known Cortex-A7/A8/A9/A15 cores being products of this architecture. The ARMv8 instruction set was released in November 2011 and for the first time supported a 64-bit instruction set, forming the core foundation for Apple to launch the first 64-bit processor, the A9, in 2013.
Today, the self-developed/non-self-developed processor architectures we commonly encounter in smartphones are all based on the ARM instruction set (with a few exceptions for Intel core smartphones using the x86 instruction set).
Why Are There Standard and Self-Developed Architectures for Mobile Processors?
Although ARM has established a unified position in the mobile field with its ARMvX architecture, this company from the British Isles does not operate like Intel, doing everything in-house. Instead, it offers open licensing, allowing manufacturers to design and produce final products according to their needs.
As part of the licensing, ARM divides its offerings into two categories. The mainstream option is to license ARM-designed IP cores, such as Cortex-A53/A72. Manufacturers that obtain such licenses only need to choose the number of cores, bus interconnects, and cache, and they can basically complete the CPU design. Thus, we refer to this direct use of ARM-designed core solutions as the standard architecture, which includes chips from MediaTek, Samsung, and HiSilicon.
The other option is to license ARM architectures, such as ARMv7/ARMv8. Manufacturers that obtain these instruction set architectures must design their own cores and then complete the entire CPU construction. This is what we refer to as self-developed architecture, which is the basis for most chips from Apple and Qualcomm.
The greatest advantage of the first standard option is that it saves time and effort, reducing costs. Chip manufacturers only need to follow ARM’s pace to ensure they are not left behind, guaranteeing performance while bringing products to market as quickly as possible. This also makes it easier for those with a basic understanding of smartphones to assess the strength and positioning of chips. The downsides are that it sacrifices differentiation and cannot create unique selling points, and that manufacturers must always keep up with ARM; falling behind can easily lead to a perception of being low-end and obsolete.
The second self-developed option offers flexibility and variability, allowing manufacturers to design single-core performance that exceeds standard IP and lower power consumption, while also having great freedom in bus interconnects, providing ample room for development. However, this option must be based on the premise that the chip manufacturer can afford to invest money and hire talent, and it may take a long time to achieve results that surpass standard architectures.
Essentially, the choice of which option to adopt is a different decision made by chip manufacturers based on their own capabilities, financial resources, time costs, and the needs of the final product. Some manufacturers may consistently use standard architectures, but once their reserves mature, they may switch to self-development, such as Samsung. Others may love self-developed architectures but revert to standard architectures when their pace is disrupted or when cost is a greater concern, like Qualcomm. Nothing is set in stone, so we do not need to expend energy distinguishing between standard and self-developed architectures; we should focus on actual performance instead.
Actual Performance Cannot Be Determined Solely by Architecture
Can we determine actual performance solely based on architecture? Not necessarily.
1. Standard
In terms of actual performance, comparisons between standard architectures are relatively straightforward and can generally be summarized as A72 > A57 > A15 > A17 > A9 > A53 > A35 > A8 > A7 > A5. Except for the A15 and A17 cores, which are 32-bit (with A17 optimized based on A12, approaching the performance of A15 but with lower power consumption), the remaining cores with one-digit names are based on the 32-bit ARMv7 instruction set, while those with two-digit names are based on the 64-bit ARMv8 instruction set.
2. Apple
However, once self-developed architectures are included, comparisons become more complex. Let’s first discuss Apple’s self-developed architecture. Starting with the A6 processor, Apple pioneered the development of its own cores, first launching the Swift architecture based on ARMv7, which performs between the standard A9 and A15, surpassing Qualcomm’s Krait 300 at the time.
With the A7, Apple showcased unprecedented design capabilities, designing the Cyclone core based on the 64-bit ARMv8 architecture just one year later, which integrated over one billion transistors, with its dual-core configuration performing equivalently to a quad-core A15 processor. By the time of the A8 chip equipped in the iPhone 6, the improved Typhoon architecture increased processor performance by 25%, with single-core performance exceeding that of A57 and multi-core performance only slightly trailing behind the octa-core A57+A53 Exynos 7420 and Snapdragon 810.
As for the latest A9 chip, it utilizes the third-generation 64-bit architecture Twister core, which boosts CPU performance by 70% compared to the A8, with single-core performance leading the latest A72 architecture Kirin 950, making it the strongest single-core in commercial chips to date.
3. Qualcomm
Another company keen on developing self architectures is Qualcomm. As early as the Snapdragon S1 era, Qualcomm adopted the Scorpion core based on ARMv7 architecture in the QSD8250. Compared to the popular A8/A9 standard cores at the time, Scorpion added some out-of-order execution capabilities and supported asynchronous symmetric multiprocessing, excelling in high clock speeds, low power consumption, and enhanced floating-point calculations, although its specific performance was slightly weaker than A9. This architecture was used in Snapdragon S1, S2, and S3 processors, but later became somewhat outdated, prompting Qualcomm to introduce the Krait core.
The Krait cores are divided into four generations: Krait 200, Krait 300, Krait 400, and Krait 450, all based on the ARMv7 architecture. The first generation Krait was used in the Snapdragon S4 processor and could execute three fetch and decode operations per clock cycle, with the backend execution units increasing from three in Scorpion to seven, and the pipeline length increasing from 10 to 11 stages, with actual performance slightly weaker than the 15-stage pipeline of A15. The second generation Krait 300 core improved the branch prediction module, added out-of-order execution engines, and brought better floating-point computing capabilities, used in the first generation Snapdragon 600, with performance close to A15 but with lower power consumption.
The third generation Krait 400 was manufactured using 28nm HPM technology, improved the memory controller with lower latency, and adopted a higher frequency secondary cache, outperforming A15. The well-known Snapdragon 800/801 used this core. The last generation Krait 450 was used in the less common Snapdragon 805, with the main change being an exaggerated clock speed of 2.5GHz.
After Krait 450, Qualcomm’s next generation Snapdragon 810 turned to the standard A57 core due to competitive reasons, performing relatively steadily. In the upcoming Snapdragon 820, Qualcomm finally launched the long-awaited Kryo core, its first self-developed 64-bit core, with single-threaded performance surpassing the latest standard core A72.
4. Others
Additionally, Samsung and Nvidia have also ventured into self-developed core designs to some extent.
Samsung’s Exynos chips have always used standard Cortex cores, and last year, the Exynos 7420 made headlines with its 14nm process technology. However, for a company that has always emphasized research and development, it is certainly unsatisfactory to only watch others thrive with self-developed architectures. Therefore, the Exynos 8890, which will be featured in the upcoming Samsung S7, adopts the self-developed Mongoose core based on the ARMv8 architecture, outperforming A72 and rivaling Qualcomm’s Kryo.
It is worth mentioning that in the animal kingdom, the mongoose represents the natural enemy of Qualcomm’s previous Krait (King Cobra).
Besides Qualcomm, Nvidia also used its self-developed 64-bit Denver core in one version of the Tegra K1 chip, but due to its late launch, power consumption, and lack of baseband, it has hardly been adopted in any models. The later Tegra X1 returned to the standard A57+A53 big.LITTLE design.
Conclusion
From the above discussion and review, we can generally conclude that self-developed architectures usually outperform standard architectures in terms of performance, but this performance advantage refers to single-core performance, as multi-core performance is undoubtedly much stronger in octa-core designs.
Although self-developed architectures have many advantages, not everyone can play in this field. Even Qualcomm, as strong as it is, has rumored that the next-generation Snapdragon 830 will abandon self-developed architecture. While I do not fully believe this is true, it does reflect the enormous resources required for self-design.
As consumers, as long as the chips they sell are of comparable performance, that is sufficient; there is no need to nitpick a few hundred points in benchmark scores, nor is there any need to attack each other based on whether a product is self-developed or not.
Leave a Comment
Your email address will not be published. Required fields are marked *