Loongson’s First GPGPU Chip, Q3 Tape-out

Loongson’s firstGPGPUchip9A1000development is nearly complete, and tape-out will be delivered within the third quarter.Loongson's First GPGPU Chip, Q3 Tape-out

Recently, an investor asked Loongson Technology on an interactive platformif the9A1000has successfully taped out.

OnSeptember15,Loongson Technology responded that the development of Loongson’s firstGPGPUchip9A1000is nearly complete, and tape-out will be delivered within the third quarter. Success will depend on the test results after the tape-out returns.

It is reported that theLoongson9A1000graphics card is primarily targeted at the entry-level market andAIinference acceleration field, with performance goals aimed atAMD‘sRX 550graphics card.

Mentioning theRX 500series, this isAMD’smilestone product, especially theRX 580model, which is still popular among many gamers; even theRX 550, a relatively budget-friendly version, maintains a high level of activity.

Looking back at theRX 550, this graphics card usesGF1414nm process technology, equipped withGCN 4.0architecture, containing512stream processors, matching128-bitbandwidth with2/4GB GDDR5memory configuration, usingPCIe 3.0 x8interface specification,TDPis only50W, and its overall performance is roughly comparable toNVIDIA‘sGTX 650 Ti Boost, making it a classic example of high performance and low power consumption.

According to official information, a major highlight of the Loongson9A1000graphics chip is its support forPCIe 4.0system bus, and it is compatible with128-bit LPDDR4Xhigh-speed video memory. Although specific details on the number of compute cores, memory size, operating frequency, and power consumption have not been disclosed, the exposed structural diagram reveals eight major compute clusters, supplemented by on-chip interconnect networks and secondary cache mechanisms, indicating a solid internal architecture design.

In terms of software compatibility, the Loongson9A1000fully embracesOpenGL 4.0,OpenCL 3.0, and other mainstream graphics and computingAPIs, with built-in professional-grade video processing components, hardware decoding capabilities coveringH.264,H.265codec standards, and supports various display output protocols such asHDMI 2.1,DisplayPort 1.4, and classicVGA.

In-depth analysis of its performance specifications shows that the Loongson9A1000has a pixel fill rate of up to16GP/s (16 billion pixels per second), a texture fill rate of32GT/s (32 billion texture elements per second), and floating-point computing capability reaching1TFLOPS (1 trillion operations per second) atFP32 precision,64GFLOPs (640 billion operations per second) atFP64 precision, while inINT8 integer computing environments, it soars to32TFLOPS (32 trillion operations per second). These metrics collectively outline a powerful computing platform that balances graphics rendering and deep learning acceleration.

In the past,GPU as its full nameGraphics Processing Unit corresponds to the Chinese图形处理器, was mainly used for graphics rendering tasks. Today,GPU has been assigned a moreversatile role. For example, in the current boomingAI large model, when developers train models, they need to process trillions of data. If we compare the process of handling this massive data to the “plowing” of the digital age,GPU is like simultaneously starting hundreds or thousands of high-performance automated tractors to process thesefields” in a very short time, making the operation of theAI model more efficient.

Loongson has been conducting GPU research since2016, initially to provide supporting capabilities forCPU. At that time, the GPU industry was not as hot as it is now, and during the promotion of Loongson’sCPU applications, many issues arose due toGPU. For example, the supply channels for importedGPUchips were unstable, and embeddingGPU in desktop scenarios was not feasible, etc. These factors significantly impacted the functionality, performance, and cost-effectiveness of Loongson computers. Thus, Loongson concluded that:Any company that makes CPUs must have its own GPU.

Initially, the Loongson team started almost from scratch in the GPU field, but with the belief thatit won’t be harder than CPU began to explore. However, in-depth research revealed thatGPU, as an acceleration system for graphics applications, involves a lot of application-layer related background knowledge and lacks clear documentation likeCPU, making it extremely difficult to learn. Therefore, the team started with research on graphics algorithms, going through simulator architecture design research, simulator validation, logic design, and functional verification, taking5 years to launch the first generation of graphicsGPU architecture. Subsequently, they spent another2 years on two minor iterations, upgrading to Loongson7A2000 and Loongson2K2000 and pushing them to market.

After the first generation GPU entered the product iteration stage, Loongson quickly initiated the upgrade work for the second generationGPU architecture, aiming to pushGPU towards the fourth stage of development, that is, fromgraphics processor (GPU to general-purpose graphics processor (GPGPU).

9A1000 is not the only graphics card project from Loongson. The company is also developing9A2000, which is aimed at mid-to-high-end graphics cards for desktop and server applications, optimizing and enhancing its capabilities. TheGPU core is upgraded to the third generation architecture, with further improvements in computing power per unit area, graphicsAPI support forOpenGL4.6, adding virtualization support, tensor units supporting more data types,GPU scalex4 (vs 9A1000), single-precision floating-point computing power5Tflops,INT8 AI computing power160TOPS, memory bandwidth256GB/s, supporting dual-chip interconnection, overall performance doubling, reaching internationally advanced levels for the same process generation. In addition,Loongson also plans to launch9A3000 as a follow-up product to9A2000, but currently, there is no specification information available.

*Disclaimer: This article is original by the author. The content reflects the author’s personal views, and the reprint by Luku Verification is only to convey a different perspective, not representing Luku Verification’s endorsement or support of this view. If there are any objections, please feel free to contact Luku Verification.

Leave a Comment