In the study of the technical architecture of the Ascend 910C chip, it was found that the technical architecture of the Ascend 910C may have originally stemmed from the accumulation of Kunpeng ARM chips. It is a product that has gradually evolved from traditional multi-core CPU architecture to a dedicated inference chip, and early materials related to the Ascend 910B seem to support this speculation. This article will delve into these issues from three dimensions: evidence of technical inheritance, the essence of architectural transition, and characteristics of different stages, clarifying the technical origins of the two.
1. Reasonableness of the Speculation: Evidence of Inherited Underlying Technology
The technical connection between the Ascend series (including 910B/C) and Kunpeng ARM chips is not coincidental. From multiple levels of inheritance, the above speculation has a certain degree of reasonableness, mainly reflected in the following three aspects:
1. Homogeneity of the Basic Architecture: ARM as the “Control Hub” Foundation
Early Ascend chips (such as 310 and 910B) were indeed built on an ARM architecture for the control unit. According to publicly available information from 2020, although Huawei’s Da Vinci architecture is self-developed, it is still based on ARM cores — the Scalar Unit (scheduling control unit) of the Ascend 910B is essentially a customized version of the ARM Cortex-A series core, responsible for instruction parsing, task scheduling, and other general control functions, which shares a common origin with the ARM v8 architecture core technology of Kunpeng 920.
For example, the single Die control module of the 910B adopts a multi-core interconnect design similar to Kunpeng, supporting NUMA (Non-Uniform Memory Access) architecture, which is consistent with the multi-core collaborative technology path of traditional CPUs. This characteristic has led to the perception that “the Ascend 910C evolved from CPU multi-core architecture,” becoming one of the core bases for speculation.
2. Reuse of Process and Packaging Technologies
Kunpeng 920 is Huawei’s first 7nm general-purpose CPU, and its technological accumulation in Die-level packaging, multi-core interconnect, and power control directly provides a foundation for Ascend 910B/C. The “dual Die co-packaging” design of the Ascend 910C essentially reuses the multi-Die interconnect patents of Kunpeng chips (such as the UB planar interconnect protocol), achieving computational power stacking through a 392 GB/s inter-Die bandwidth, which is logically consistent with the 64-core interconnect technology of Kunpeng 920. This technological reuse further evokes thoughts of continuity in the technical context of the two, adding reasonableness to the speculation.
3. Ecological Foundation for System-Level Collaboration
Since 2019, Huawei has clearly defined the heterogeneous route of “Kunpeng (general computing) + Ascend (AI computing)”. During the early design of the 910B, hardware collaboration with Kunpeng servers was strengthened: directly connecting to the Kunpeng CPU via PCIe 4.0 interface, separating data preprocessing (general computing) from model inference (AI computing) onto different chips. This division of labor continues into the three-plane network architecture of the 910C (where the UB plane is responsible for interconnecting NPU and CPU). This system-level collaborative design closely associates the Ascend series with Kunpeng chips in application scenarios, making it easier to speculate that the Ascend 910C originates from the technical accumulation of Kunpeng ARM chips.
2. Boundaries of the Speculation: The Essential Transition from “General CPU” to “Dedicated NPU”
Despite the aforementioned technical inheritance providing a certain reasonableness to the speculation, a deeper analysis of the architectural essence reveals that the Ascend 910C has departed from the realm of “CPU multi-core evolution” and has become an independent dedicated architecture, with essential differences between the two, mainly reflected in three points:
1. Complete Reconstruction of the Computing Unit (Not Multi-Core Expansion)
The performance core of Kunpeng 920 consists of 64 ARM Cortex-A76 multi-core clusters, enhancing parallel computing power by increasing the number of cores, following the optimization approach of traditional CPU multi-core architecture; whereas the core of the Ascend 910C is the AI Core pulsating array of the Da Vinci architecture — each AI Core is a 64×64 MAC operation matrix, capable of completing 4096 multiply-accumulate operations per clock cycle. This “matrix computing unit” is fundamentally different from the “general computing core” of a CPU in hardware structure.
For instance, the 800 TFLOPS FP16 computing power of the 910C comes from the parallel computation of 256 AI Cores across two Dies, rather than simply increasing the number of ARM cores. This complete reconstruction of the computing unit is the essential difference between dedicated AI chips and general CPUs, further indicating that the Ascend 910C did not simply evolve from CPU multi-core architecture.
2. Coupling Optimization of Storage and Computing (Beyond CPU Architecture)
Kunpeng 920 relies on external DDR4 memory, with a memory bandwidth of only 200 GB/s, resulting in a low degree of coupling between storage and computing, consistent with the characteristics of traditional CPU architecture; whereas the Ascend 910C adopts a co-packaged HBM memory with the computing Die design, achieving a memory bandwidth of 3.2 TB/s, which is 16 times that of Kunpeng, and through hardware-level data prefetching, tensor compression, and other technologies, realizes “zero-latency” data interaction between computing units and storage. This optimization is specifically designed for the “high bandwidth, low latency” requirements of neural networks, which cannot be covered by CPU architecture, further proving that the Ascend 910C has broken through the limitations of traditional CPU architecture, becoming a dedicated NPU.
3. Independent Evolution of the Software Stack (Detachment from ARM Ecosystem)
Kunpeng chips are deeply dependent on the ARM instruction set ecosystem (such as AArch64 instruction set, GCC compiler), with the software stack built around the application scenarios of general CPUs; whereas the Ascend 910C has constructed an independent software stack since the 910B stage: optimizing operator compilation through the CANN architecture, adapting AI tasks based on the MindSpore framework, and even supporting custom AI instruction sets in the latest version. This means that its core functions are no longer limited by the ARM ecosystem, completely separating its technical route from that of CPUs, and confirming from the software level that the Ascend 910C did not originate from CPU multi-core evolution.
3. The “Transitional Nature” of Early 910B and the “Independence” of 910C
Combining the early materials of 910B, the speculation may stem from the “transitional characteristics” of 910B, but the 910C has achieved an independent upgrade of its architecture:
- 910B Stage (Around 2021)Due to limitations in technological maturity, it indeed adopted a hybrid architecture of “ARM control core + AI acceleration unit”, with the control part relying on Kunpeng’s ARM technological accumulation, and the software stack also being somewhat associated with the ARM ecosystem. Therefore, early materials easily led to the impression of “originating from CPU evolution,” which is an important background for the speculation;
- 910C Stage (After 2023)By reconstructing hardware through dual Die co-packaging, although the control unit is still based on ARM cores, it has achieved deep integration with the AI Core (such as unified memory management, instruction pipeline collaboration), and the software stack is completely independent, becoming a dedicated chip “centered on AI computing, with ARM as an auxiliary,” completely breaking away from dependence on CPU multi-core architecture.
4. Conclusion: Inheritance Rather than Evolution, Collaboration Rather than Derivation
In summary, the Ascend 910C has indeed reused the ARM control core, packaging processes, and system collaboration technologies of Kunpeng, and the transitional characteristics of early 910B also provide some support for this speculation, all of which are the “technical genes” left by early 910B. However, from the essence of the architecture, the Ascend 910C is not a product of “CPU multi-core evolution,” but rather, based on ARM, through the reconstruction of computing units, optimization of storage architecture, and independent software ecology, has achieved a leap from “general computing” to “dedicated AI computing.”
This relationship is more akin to “foundation and skyscraper”: the ARM technology of Kunpeng is the “foundation” (control and system collaboration) for Ascend, but the skyscraper itself (AI computing core) is a newly designed independent structure. The two represent a relationship of technical inheritance and collaboration, rather than simple evolution and derivation.