Huawei Ascend 910B Outperforms H20, 910C Approaches H100!

The performance comparison between Huawei’s Ascend 910 series (including 910, 910B, 910C) and NVIDIA’s AI chips (such as A100, H100, B200) is a complex issue that involves multiple dimensions, including computing power, energy efficiency, ecosystem, and application scenarios.

1. Performance Comparison of Huawei Ascend 910 Series and NVIDIA Chips

Overview of Huawei Ascend 910 Series

– Ascend 910 (released in 2019): Built on a 7nm process, with FP16 half-precision floating-point computing power reaching 256 TFLOPS, INT8 integer computing power of 512 TOPS, and a power consumption of 310W. Officially claimed to have double the computing power of NVIDIA’s Tesla V100 (125 TFLOPS), with nearly double the performance in training the ResNet50 network.

– Ascend 910B (mass production version): Based on SMIC’s N+1 process (equivalent to 7nm), with FP16 computing power of approximately 320 TFLOPS, equipped with 64GB HBM2e memory, memory bandwidth of 400GB/s, and power consumption of 310W. Huawei claims its efficiency reaches 80% when training large language models, with certain tests showing performance exceeding NVIDIA’s A100 by about 20%.

– Ascend 910C (samples to be sent in 2024, mass production in 2025): Achieves performance close to NVIDIA’s H100 (FP16 computing power of 989 TFLOPS) by combining two 910B chips. It uses 3D-stacked HBM3 memory (128GB), further enhancing bandwidth and performance.

Overview of NVIDIA Chips

– H100 (released in 2022): Built on a 4nm process, with FP16 computing power of 989 TFLOPS, 80GB HBM3 memory, memory bandwidth of 3TB/s, and power consumption of 700W. Its performance far exceeds that of A100, suitable for large-scale AI model training.

– B200 (based on Blackwell architecture): FP4 computing power of 20 PFLOPS, memory bandwidth of 8TB/s, and power consumption of 700W. Positioned in the high-end market, its performance far exceeds that of 910B.

2. In What Aspects Does Huawei’s 910 Chip Surpass or Approach NVIDIA?

(1) Energy Efficiency Ratio

– The power consumption of Ascend 910B is 310W, with performance of about 5.2 TFLOPS/W, while A100 consumes 400W, with performance of about 4.7 TFLOPS/W. Ascend 910B has a 23% lower power consumption at the same computing power compared to A100, with a cost of about 0.8 yuan/TFLOPS (H20 is 1.2 yuan).

– Analysis: Ascend 910B outperforms A100 in energy efficiency, making it particularly suitable for high-energy consumption scenarios such as data centers. Compared to H100 (power consumption of 700W), the low power consumption advantage of 910B is even more pronounced.

(2) Performance in Specific Scenarios

– **Fact Basis**: Huawei’s Ascend 910B achieves 80% efficiency in training large-scale language models, with certain tests showing performance exceeding A100 by about 20%. After collaborating with Baidu to optimize algorithms, Ascend 910B’s performance in tasks such as lane line detection and obstacle recognition has doubled, with power consumption reduced by 80%.

– Analysis: Ascend 910B, through its self-developed Da Vinci 3.0 architecture and dynamic tensor slicing technology, outperforms A100’s Ampere architecture in matrix operation efficiency, especially in specific AI tasks (such as training domestic large models).

(3) Cost Performance and Domestic Substitution

– Fact Basis: The price of Ascend 910B is about 120,000 RMB, H20 is about 110,000-140,000 RMB, and A100 is about 250,000 RMB (limited in the domestic market). Ascend 910B is rapidly penetrating the domestic market due to policy support and localized services, being applied in government, finance, energy, and other fields.

– Analysis: Due to U.S. sanctions, the supply of NVIDIA’s H100/A100 is limited in the domestic market, and H20’s performance is only 50% of that of 910B (FP32 floating-point performance). Ascend 910B has advantages in cost performance and availability, making it the preferred choice for domestic substitution.

(4) Ascend 910C Approaches H100

– Fact Basis: Ascend 910C, by combining two 910B chips, is claimed to achieve performance comparable to H100 (FP16 computing power of 989 TFLOPS). Reuters reported that 910C has been sampled and is expected to be mass-produced in May 2025. In DeepSeek inference tasks, 910C’s performance reaches 60% of H100.

– Analysis: 910C optimizes its architecture (such as 3D-stacked HBM3 memory), significantly narrowing the gap with H100, marking Huawei’s breakthrough in the high-end AI chip field.

3. Limitations and Gaps of Huawei’s 910 Chip

(1) Absolute Computing Power

– Fact Basis: Ascend 910B’s FP16 computing power is 320 TFLOPS, close to A100 (312 TFLOPS), but far below H100 (989 TFLOPS) and B200 (20 PFLOPS).

– Analysis: NVIDIA’s H100 and B200 lead in single-card computing power and transistor density (H100 integrates 80 billion transistors), making them suitable for ultra-large-scale AI training.

(2) Memory and Bandwidth

– Fact Basis: Ascend 910B has 64GB HBM2e memory with a bandwidth of 400GB/s; A100 has 80GB HBM2e with a bandwidth of 2TB/s; H100 has 80GB HBM3 with a bandwidth of 3TB/s; B200 has a bandwidth of up to 8TB/s.

– Analysis: NVIDIA has advantages in memory capacity and bandwidth, making it suitable for data-intensive tasks. Ascend 910C’s 128GB HBM3 memory will narrow the gap, but verification is still needed.

(3) Software Ecosystem

– Fact Basis: NVIDIA’s CUDA ecosystem is mature, with 90% of AI frameworks globally based on its development, and high multi-card collaboration efficiency. Huawei’s CANN platform has adapted over 5,000 mainstream AI models, but its ecosystem completeness is still lagging.

– Analysis: CUDA’s developer toolchain and community support are NVIDIA’s core advantages, and the Ascend ecosystem needs time to catch up.

(4) Process Technology

– Fact Basis: Ascend 910B is based on SMIC’s N+1 process (equivalent to 7nm), while A100/H100 use TSMC’s 7nm/4nm processes, which have higher transistor density.

– Analysis: TSMC’s advanced processes give NVIDIA chips superior performance and power control, while Huawei is limited by domestic process technology.

4. Sources and Credibility of Fact Basis

– Huawei’s official statement: Ascend 910B exceeds A100 in training efficiency and specific scenario performance, and 910C is comparable to H100.

– Third-party reports: Reuters, South China Morning Post, etc., confirm that 910B’s performance exceeds H20, and 910C approaches H100.

– Industry validation: Companies like iFlytek, Baidu, and DeepSeek have adopted 910B, verifying its reliability in practical applications.

Credibility analysis: Verification is needed in conjunction with third-party tests (such as DeepSeek’s 60% performance data). The public specifications and widespread application of NVIDIA chips provide a higher credibility. Overall, Ascend 910B does have advantages in specific scenarios and energy efficiency, but there remains a performance gap compared to H100/B200.

5. Conclusion

The Huawei Ascend 910 series surpasses or approaches NVIDIA in the following aspects:

– Energy Efficiency Ratio: 910B outperforms A100, with 23% lower power consumption and higher performance per watt.

– Specific Scenarios: 910B outperforms A100 by about 20% in training domestic large models and optimization tasks.

– Cost Performance: 910B’s price and availability are superior to the limited A100/H100 and better than H20.

– Progress of 910C: Approaching H100 performance, narrowing the gap in the high-end market.

Aspects Not Surpassed:

– Absolute computing power and memory bandwidth lag behind H100/B200.

– Software ecosystem (CANN vs. CUDA) still needs improvement.

– Process technology limitations affect performance enhancement.

Fact basis: Based on Huawei’s official data, third-party reports (such as CSDN, Reuters), industry application cases, and NVIDIA’s public specifications. The Ascend 910 series is rapidly rising in the domestic market due to policy support and cost performance, but to fully surpass NVIDIA, further breakthroughs in computing power, ecosystem, and process technology are needed.

Related posts

Leave a Comment Cancel reply