According to reports, the Loongson 9A1000 GPU has been taped out, targeting the entry-level market. By reducing the area of the stream processors by 20% and increasing the operating frequency by 25%, significant power optimization has been achieved, with power consumption reduced by up to 70% in light load scenarios. It supports OpenGL 4.0 and OpenCL ES 3.2 API, with FP32 precision reaching 1TFLOPS, and INT8 integer computation capability reaching 32TFLOPS, while AI computing power can reach 40TOPS, which is about four times faster than the previous integrated graphics LG200. The 9A1000 GPU mainly addresses the issue of availability, aiming to meet the demand for localized computing power in fields such as industrial control and edge computing.

According to Loongson’s PPT, the future 9A2000 will integrate four 9A1000 cores, with expected single-precision floating-point computing power reaching 5TFLOPS, and INT8 AI computing power at 160TOPS. In terms of performance, the 9A1000 cannot be directly compared to NVIDIA GPUs, as the gap is very obvious, resembling more of an infant.
Therefore, the significance of the Loongson 9A1000 is to solve the availability issue, first by providing hardware, gradually building a software ecosystem, improving API compatibility and optimizing the software toolchain, thus reducing the learning curve for developers.
Currently, for Loongson, the top priority is to address the performance of the CPU and the software ecosystem issues, followed by the GPU. In fact, as long as the CPU can break through, it can completely allow its own GPU to gain a growth ladder and a technological iteration space.