Gao Wen, a representative of the National People’s Congress, an academician of the Chinese Academy of Engineering, and the director of the Pengcheng National Laboratory, mentioned in an interview that the construction of the computing power network includes three core elements: large-scale core computing power, super optical networks, and computing power scheduling systems. The ideal scenario is that when we need computing power, we just plug it in without worrying about where the computing power comes from or comparing prices ourselves; the system will automatically match the most cost-effective computing center. This model not only improves the utilization of existing computing resources but also avoids waste caused by redundant construction across different regions.
Original Text:
Reporter: What impact has the emergence of DeepSeek had on you?
Gao Wen: I think it will advance the applications in the entire field by many years. Previously, large models like ChatGPT, LLaMA, Google, or Facebook required leading companies to have sufficient resources, computing power, talent, and data to develop them. Now, the applications of artificial intelligence abroad are mainly in the hands of a few large companies, such as Microsoft and Google. The emergence of DeepSeek has changed that; it has democratized this application, meaning that it is not just a few leading companies or top players that can engage with it, but all startups can participate, which is its greatest contribution.
The emergence of DeepSeek has triggered a series of chain reactions, lowering the application threshold and stimulating more demand, leading to a surge in societal demand for computing power. If we compare artificial intelligence to “smart cars,” then “computing power” is the “gasoline” that drives them. Only with sufficient computing power supply can artificial intelligence develop fully.
Reporter: From the perspective of increasing demand for computing power, how will your laboratory’s plans change compared to before?
Gao Wen: Initially, we did not anticipate such a rapid increase in demand, but now we need to adjust quickly. There is a lot of work that needs to be expedited.
The Pengcheng Laboratory, led by Gao Wen, is a new type of research institution in the field of network communication approved by the central government. One of its main tasks is to lead the research and construction of the “China Computing Power Network” around major national strategies such as “East Data West Computing” and “Digital China”.
Reporter: What problems does the construction of the China Computing Power Network aim to solve?
Gao Wen: We hope to enable users to use computing power as easily as using electricity; when needed, they can purchase it, and they can buy from the cheapest source.
The computing power referred to here specifically refers to intelligent computing power designed for AI training, which is different from ordinary computer computing power. It requires thousands of dedicated chips to work together, as well as investment in infrastructure and energy support.
The “East Data West Computing” project aims to send data that requires computation from the east to data centers in the west for processing, settlement, and storage. The original intention of building the China Computing Power Network is to integrate computing resources scattered across the country, including supercomputing centers, data centers, and cloud computing platforms, to form a unified computing resource pool, creating a digital economic infrastructure with convenient resource access, unified task scheduling, and a sustainable operational model and mechanism, thus promoting the domestic autonomous computing resource to enter an era of “shared by all”.
Reporter: Will the application across various industries affect computing power?
Gao Wen: The demand should be greater than before.
Reporter: With such high demand, can you keep up with it?
Gao Wen: Society is investing quite a bit in computing power, but of course, it may not be balanced. Some computing power investments yield high usage efficiency, while others may have lower efficiency, with varying levels of quality. We hope that through the computing power network, we can improve the utilization rate of all invested resources. This is actually quite similar to the initial concept of the power grid. Initially, power was generated by individual factories for their own use, but later it was discovered that residents and other institutions also needed electricity, leading to the construction of power plants everywhere. In some places, there may not be enough power, while in others, there may be excess. So what to do with the surplus? The computing power network hopes to follow a similar path.
We now need to build a national highway for these computing powers, first using new technologies to construct it, while also hoping to introduce some new management models, not the conventional communication management model, but using methods like data correlation. We are currently communicating with the National Data Bureau to use a new model.
The construction of the computing power network includes three core elements: large-scale core computing power, super optical networks, and computing power scheduling systems. Its ideal scenario is that when we need computing power, we just plug it in without worrying about where the computing power comes from or comparing prices ourselves; the system will automatically match the most cost-effective computing center. This model not only improves the utilization of existing computing resources but also avoids waste caused by redundant construction across different regions.
Reporter: If we fall behind in any aspect, we may lag in AI competition; is this the current situation?
Gao Wen: Yes, because this is a systemic issue.
Reporter: Are you anxious about this?
Gao Wen: I shouldn’t be; technically, we are quite confident and are moving forward step by step. We are very confident, of course, we also hope to obtain the best resources, such as the best technology, the best software, and the fastest machines, the most advanced in the world. But now that we are constrained, we can only use integrated technology to create the best equipment.
Since the computing tasks in the computing power network may involve critical areas such as scientific research, national defense, and finance, preventing data leaks and cyberattacks has become an important task in the construction and operation of the computing power network. The Pengcheng Laboratory has a dedicated team focused on research and application of cybersecurity technology. Recently, the Pengcheng Laboratory participated in the cybersecurity assurance for the 9th Asian Winter Games.
Reporter: What needs to be protected?
Gao Wen: Much of the management is done online. Once this system is attacked and paralyzed, everything will be paralyzed.
Reporter: Is the risk of being attacked imagined or does it exist in the real world?
Gao Wen: It exists in the real world; some are malicious, aimed at embarrassing you. There are also many hackers who want to show off their skills, saying they can break in and paralyze you. Both types exist.
Reporter: Some commercial companies can ensure security online; why do you want to do it?
Gao Wen: When encountering very difficult problems, a very strong team is needed, and some commercial companies may not be able to solve them.
Reporter: What are the difficulties?
Gao Wen: Conventional companies use general methods, such as running a tool program to scan the data from start to finish to see if there are any anomalies. But usually, experts won’t let you find them, so more specialized technology is needed. Our team has many tools to analyze responses and may find clues to finally uncover the issues.
In 2022, the first phase of the “China Computing Power Network” project, “Intelligent Computing Network,” was officially launched. It connects and manages more than 20 different regional and types of computing centers, gradually increasing the computing power scale to 5E Flops, equivalent to completing 50 trillion calculations per second. One of its computing power hub nodes is the “Pengcheng Cloud Brain II” AI computing power platform of the Pengcheng Laboratory.
Gao Wen: From this view, these are eight rows, which actually correspond to four machines, with every two rows being one machine.
Reporter: Is its power consumption large?
Gao Wen: Not small; it may cost one to two million in electricity bills each month.
“Pengcheng Cloud Brain II” is a super intelligent computer jointly developed by the Pengcheng Laboratory and related enterprises, with a peak computing power of completing 10 billion billion calculations per second, which started operation in 2020. It is ten times more powerful than “Pengcheng Cloud Brain I,” which can complete 100 trillion calculations per second, and this upgrade took only one year.
Reporter: It took only a year to increase by ten times? What happened?
Gao Wen: When we were developing “Pengcheng Cloud Brain I,” we were still working on discriminative artificial intelligence, which typically requires less computing power; 100P was sufficient. We usually anticipate that the computing and storage capabilities required for language models will be higher than those for images, as language data is easier to obtain, and language processing requires more processing power, so it should be ten times larger than image processing.
Currently, “Pengcheng Cloud Brain II” has won the championship in the IO500 global high-performance platform data throughput ranking for nine consecutive times and has ranked first in the international AI computing performance AIPerf500 ranking for four consecutive sessions. Based on “Pengcheng Cloud Brain II,” the Pengcheng Laboratory has built an AI training platform capable of simultaneously processing ultra-large-scale AI models with over a hundred billion parameters. “Pengcheng Brain Sea” is an ultra-large-scale natural language processing model trained and run on “Pengcheng Cloud Brain II” by the Pengcheng Laboratory.
Reporter: Why did Brain Sea not come out, but DeepSeek did?
Gao Wen: It’s not that our Brain Sea is not capable; in fact, this is the clever part of DeepSeek. Brain Sea and ChatGPT use exactly the same technology, which includes a model called the attention mechanism. For example, in the past, if an article was processed by a computer, by the time it finished processing, the earlier parts would have been forgotten. However, GPT, which is based on the Transformer, invented a method called the attention mechanism, which means I only pay attention to the relevant parts and ignore the rest, focusing on the big picture.
The original GPT was a large package with all functions included, making it cumbersome to use. DeepSeek has done something different; it trained it with specific expressions for specific fields, so the training cost is not as high. It has a total of 256 experts, and when using it, you don’t need to install all 256; installing a maximum of 8 is sufficient, which significantly reduces the cost and training time. I believe DeepSeek is not a theoretical innovation; it is more of an engineering one.
Currently, two versions of the “Pengcheng Brain Sea” large model have been open-sourced to the public. Gao Wen and his team hope to cultivate “Pengcheng Brain Sea” as a seed application on the China Computing Power Network through the open-source approach of the entire model training process.
Reporter: Are you open-sourcing the underlying source code or the parameters?
Gao Wen: Actually, both layers are included. We are opening all parameters, which we call slices. During the training process, I open a slice at regular intervals, allowing researchers to study how it was trained and what changes occurred in the process. Additionally, we are also open-sourcing the source code, which means that users can take this source code and retrain it with their own data.
Reporter: So, you are investing all your people, energy, and funds, which means you are doing it for others to benefit; can that be understood this way?
Gao Wen: However, open-sourcing is not just about others benefiting. After I open-sourced, others who use my open-sourced materials also need to open-source their contributions. Once they open-source, it will improve my original system, and we will iterate on all the open-sourced materials. Many hands make light work; everyone involved is adding fuel to the fire.
The computing power platform that nurtured “Pengcheng Brain Sea,” “Pengcheng Cloud Brain II,” also adheres to the principle of openness. Fifty percent of the computing power of “Pengcheng Cloud Brain II” is used by the Pengcheng Laboratory itself, 40% is openly provided to domestic partners, research institutions, and universities, and 10% is released to the Shenzhen Municipal Bureau of Industry and Information Technology for social developers to apply for use. In the wave of artificial intelligence, the spirit of open-source and openness is becoming a consensus in the industry.
Reporter: Open-source means sharing, right? Does sharing contradict competition?
Gao Wen: From the perspective of open-source, it doesn’t matter whether the technology is yours or not, because you are iterating on it. The reason OpenAI is struggling now is that it is closed-source; being closed-source means others can only seek it out, sign agreements, and only if it agrees can they use it; if it doesn’t agree, they cannot.
Gao Wen: This is the two technical routes of the entire ecosystem. The early ecosystem was entirely closed-source due to copyright, which was meant to protect your software rights. However, later, Linux (the operating system) was the first to open-source software, with the philosophy that this is humanity’s asset, and I will release it, but I require everyone who uses this software to promise that if you create something new, you must also open-source it.
My personal understanding is that open-source may better align with the direction of human social development, similar to knowledge. In the past, many artisans and craftsmen passed down their skills from generation to generation, and if one generation failed to pass it down well, it could be lost. Why can we learn knowledge now? Because it is open-sourced; knowledge is open-sourced.
Software is the same; if software becomes knowledge, then it should be open-sourced, and only through open-sourcing can it iterate.
While “Pengcheng Cloud Brain II” operates smoothly, Gao Wen is already looking at new issues. He is currently leading a team to collaborate with related enterprises to develop the next generation of super intelligent computer, “Pengcheng Cloud Brain III,” with a target intelligent computing power scale of completing 1600 billion billion calculations per second, which is 16 times more powerful than “Pengcheng Cloud Brain II.”
Reporter: What can this Cloud Brain III do in the future?
Gao Wen: It can train and use multi-modal large models. Currently, the intelligence of large language models is primarily linguistic, while other intelligences are relatively weaker. A multi-modal model means that in addition to language, I want to mix sound, vision, and language together for training, so that the overall level of intelligence gradually approaches that of humans. We also hope that through the construction of this machine, we can contribute to the domestic ecosystem.
The entire machine is not yet completed, but our prototype has already been developed. The entire machine consists of 157 cabinets, but we have already built 3 cabinets, and these models are being trained on the prototype in these 3 cabinets. The mission given to us by the state is to conquer a peak that no one has ever reached before, so we must be the first to reach it.
Source: CCTV.com