Understanding the Core Differences Between NPU and GPU: A Deep Dive

The full text contains over 3,370 words, with an estimated reading time of 8 minutes.

Understanding the Core Differences Between NPU and GPU: A Deep Dive This article is welcome to be reprinted, but please indicate the source: WeChat Official Account – theDennisCode, Author – Dennis Liu.The content of this article represents personal views and is not related to any organization or individual, including but not limited to the author’s current and former employers.In the martial arts world, there is a legendary figure known as the Leather Jacket Swordsman, who has long dominated the martial arts rankings with his Huang-style swordsmanship, unmatched by anyone. The Leather Jacket Swordsman wields a divine weapon known as the GPU.Wherever he goes, heroes from all walks of life strive to emulate him,the martial arts community either imitates the GPU or creates another divine weapon, the NPU,yet no one has been able to reach his level. In an instant, the martial arts world is left in awe.Recently, chip stocks have been very popular, and the concept of chips is also trending. Just like the legend of Nvidia and Huang, everyone feels somewhat familiar yet a bit mysterious. Among the most mysterious and confusing concepts are the GPU and NPU.Some say that the NPU is an innovative chip that disrupts the GPU. Others say that the NPU is a miracle worker but has poor generality and is a specialized performer.So, what exactly is the difference between NPU and GPU?There are many claims online, the most common being that the NPU is a dedicated chip designed specifically for artificial intelligence and neural networks, while the GPU is a more general-purpose computing tool.However, these explanations are simplistic and often leave people confused after reading. Is there a more in-depth explanation?GPULet’s start with the GPU. The full name of GPU is Graphics Processing Unit, which was originally used for processing graphics, mostly for game rendering. The task of game rendering, as discussed in the article “Smart Graphics | The Ins and Outs of Intelligence and Graphics,” is to display defined shapes on the screen.Displaying images can also be done by the CPU, which is what we commonly refer to as the processor. The CPU’s method is roughly illustrated in the image below:How the CPU draws imagesThe CPU reads the shape data from storage and checks each pixel on the screen one by one to determine what to draw and what color it should be. However, because the CPU examines each pixel individually, the speed is very slow.At this point, the GPU was introduced. The original function of the GPU was to draw images on the screen faster. To achieve this, the GPU reduced the functionality of the computing units based on the CPU but increased the number of computing units, allowing different units to compute different pixels simultaneously, greatly improving speed. The working method of the GPU is roughly illustrated in the image below:How the GPU draws imagesIn addition to simply copying a simplified version of the processor to process in parallel, the GPU also has dedicated units for processing object shapes and determining which object is at which pixel. These two types of operations greatly enhance the speed at which we draw images on the screen. This has significantly contributed to the booming development of the gaming industry and the well-being of players.The Era of Artificial IntelligenceJust as the GPU was quietly accelerating graphics, the development of artificial intelligence suddenly surged. The technology of neural networks emerged, allowing machines to achieve satisfactory levels of intelligence in many areas. From early image recognition to later image generation, text generation, large models, and chatbots, artificial intelligence has entered households. At its core, this is primarily due to the development of neural network algorithms. Various neural networks, large and small, with different forms, have become the driving force behind artificial intelligence. Neural network algorithms are derived from the connections of neurons in the human brain. If illustrated, the most basic computation looks like this:Simple illustration of neural network algorithms, with the cloud-like shapes representing neuronsWe input the data that needs to be processed, such as images and text, using pixel and character units into the algorithm.The neural network consists of layers of neurons. The initial neurons receive the input and then output to the next layer. However, the neurons between layers are not only interconnected but also have their own weight coefficients at the connection points. Therefore, the neurons in the next layer will receive output data from the neurons in the previous layer, and these data will be calculated with their respective weight coefficients. The neurons in the next layer will sum these data, perform some additional processing, and then send it to the next layer of neurons. This process continues through multiple layers until the neural network outputs the predicted result.This process is still slow when calculated by the CPU, as the CPU must compute each neuron one by one. At this point, the GPU was suddenly discovered. Since the GPU has processing units prepared for each pixel, it can correspond to each neuron one by one. Thus, people began to use GPUs, originally designed for image processing, to process neural networks, as illustrated below:How the GPU computes neural networksWe only need to pass the output from the previous layer and the weight coefficients between the two layers to different processing units, which then process according to their assigned correspondence with the next layer of neurons, ultimately calculating the results for the next layer of neurons.It’s truly amazing!Thus, with the GPU’s assistance, the development of artificial intelligence has reached new heights.NPUBy now, some may have noticed the “idle” processing unit in the image above. Yes, because the GPU was originally designed for graphics processing, there will inevitably be some waste when calculating neural networks. Additionally, due to the connection relationships, the output from the previous layer’s neurons may be redundantly sent to different lower-layer neurons, which can also lead to significant waste.Driven by various reasons, people began to seek specialized chip designs for processing neural networks. This led to the birth of the concept of NPU, or Neural Processing Unit. This is a broad concept that refers to various chips specifically designed for processing neural networks. However, these chips generally share a common core: acceleration of matrix operations. In simple terms, the work of the NPU is illustrated in the image below:NPU’s method of computing neural networksWe connect many processing units that are simpler than the GPU’s and specifically designed for the calculations required by neural networks, allowing them to share data. Thus, we only need to provide the output from the previous layer and the weights of each neuron to this so-called “matrix processing unit,” and these units can work together to quickly compute the output of the next layer of neurons. This is the NPU.In you, I see me; in me, you see youWhile writing this article, I came across a video by Bilibili UP master Engineer Sun, which is very well made, and I encourage everyone to check it out. I will provide the link at the end of the article [1]. Sun explains the workings of the CPU, GPU, and NPU from another perspective and finally mentions the integration of modern GPGPU with neural network processors.Indeed, modern GPUs and NPUs are now in a state of mutual inclusion.For example, Nvidia’s GPU includes a neural network processor called Tensor Core. This essentially utilizes the high efficiency of neural network processors. However, since it retains the original processing units of the GPU, Nvidia’s GPU has considerable generality. For instance, if we invent a new algorithm that requires additional processing for each neuron’s output, the GPU can handle it as illustrated below:How a GPU with NPU units computes neural networksSimilarly, NPU processors are also not to be outdone, incorporating many general-purpose modules to achieve the effects shown in the image below:How an NPU with general-purpose units computes neural networksWhat are the benefits of this? For the GPU, adding neural processing modules naturally improves the efficiency of processing neural networks.For the NPU, adding general processing units allows it to better handle some general scenarios.For example, there is a classic industry consensus that the generality of GPUs allows them to be used for both inferring neural networks and computing more complex neural network training. In contrast, NPUs, due to their high specialization, are not as suitable for training.However, at this point, some may wonder why a certain domestic NPU neural network chip is also said to be very good for training. What is the reason for this?This is precisely why NPUs need to add general processing units—to cope with more complex and general scenarios.GPU-NPU, the final soul-searching questionAt this point, we have essentially reached the end of various articles explaining GPUs and NPUs. However, astute readers must still have a plethora of questions. For instance, since modern NPUs and GPUs have become increasingly similar and complementary, why is the Huang-style swordsmanship practitioner still able to lead the way?The answer to this question is quite simple; professionals like to use a term called “ecosystem” to explain it. I would call it inertia and accumulation. It is similar to the ecosystem compatibility issues among Android, Windows, and Apple operating systems. Because GPUs, especially Nvidia GPUs, appeared early and accompanied the development of neural networks, many software, tools, acceleration methods, and operators used to assist neural network development and operation were designed based on Nvidia’s GPUs. This “accumulation” fosters people’s “inertia”: in other words, those who are accustomed to using Nvidia GPUs are more inclined to continue using them. If they switch to other chips, they not only face additional learning costs but also have to redevelop the aforementioned tools, software, acceleration methods, and operators to achieve a user experience close to that of GPUs.To put it metaphorically:Illustration of the complete process of using an Nvidia (NV) GPUNvidiahas divided the processors inside its GPU into different groups, and it has its own hardware language, supporting facilities, and a hardware programming model called CUDA, along with a corresponding software programming language. All of this, combined with the previously mentioned “ecosystem,” allows an existing neural network algorithm to run better on their GPUs.However, the same algorithm, if placed on other chips, must overcome the issues of tool and hardware language adaptation, as well as the efficiency of mapping neural networks to hardware:Thus, accumulation and inertia are the core divides between modern GPUs and NPUs, which is also why some domestic GPUs can rise rapidly:because they can adapt and be compatible with software ecosystems based on CUDA.This is also the root of why some domestic NPUs can break through the barriers:because they have invested significant manpower and resources, along with customer support, to attempt to replicate the accumulation and inertia found in Nvidia GPUs.Therefore, this is what I believe is essential for me as a practitioner to remember: what we should truly value is accumulation, refinement, iteration, and taking the time to seriously engage with our products, our technology, and our market.Image | Dennis LiuLayout & Text | Dennis Liu[1] All processors! What is the difference between CPU, GPU, and NPU? https://www.bilibili.com/video/BV1wbcdeWENh

Related posts

Leave a Comment Cancel reply