Understanding DPU in InfiniBand: The Future of AI Computing Hardware

This article will focus on AIGC content generation and processing series, where you can find more exciting articles about AIGC content generation and processing in the public account service menu.

AI, represented by large models, is driving a new round of technological revolution and industrial transformation, presenting enormous opportunities and disruptive challenges across various industries. In the field of content production, AIGC content generation is a key force in promoting the transformation of new media, and the grounding of industry large models will provide media companies with new paths to explore the technological revolution in new media.

In this round of AI revolution, AI hardware boards are at the core of AI computing power. Without the support of AI hardware computing power, even the largest models are futile; even the simplest model parameter tuning requires support from AI computing power boards, or else one must rely on open cloud resources, which, of course, are also backed by professional AI computing power boards.

At the same time, we see a shortage of AI computing infrastructure in China. Currently, data centers and cloud computing centers are not built around CPUs but are expanding based on GPUs, which work in parallel, highlighting the need for a comprehensive understanding of the importance of GPUs.

This series of articles will cover the rise of AI computing power boards and GPUs, the leading role of NVIDIA in AI hardware computing power boards, and the situation of domestic AI computing power hardware boards, discussing which one might become a domestic alternative to NVIDIA.

In previous articles, we introduced NVIDIA’s GPU development stance, the logic of CUDA cores and programming, as well as NvLink and NvSwitch. We all know that NVIDIA has become the absolute leader in AI hardware not only due to CUDA and Nvlink but also because of InfiniBand’s infinite bandwidth, which is a key factor in NVIDIA’s dominance in the hardware field. Today, we will continue to discuss InfiniBand’s DPU in NVIDIA’s AI clusters.

01. Difference and Specific Role of DPU and CPU

We know that in AI computing systems, the computing power business in data centers is rapidly developing. CPUs not only need to provide substantial computing power but also manage virtualization, networks, storage, and security in the data center, which prevents the CPU from being fully utilized.

For instance, if you buy a 100-core CPU, you can only use 90 cores. Where do the other 10 cores go? They are consumed running a bunch of data center software for security, storage, management, etc.

Thus, the overhead from these extra 10 cores leads to some waste. It’s like spending 100 dollars but only 90 dollars are used for computation. Therefore, there is a need for a specialized role to handle the heavy lifting—DPU. The DPU is here to take on the tasks in a data center beyond computation.

As shown in the figure above, in the era without DPU and GPU, the CPU had to handle everything: operating system management, security, specific data processing, and parallel tasks for graphics and AI. Now, with the introduction of GPUs, tasks related to graphics and parallel processing can be offloaded, but as a large model system, the CPU still needs to share many of its performance responsibilities. This is where the DPU plays its role.

The full name of DPU is Data Processing Unit. NVIDIA’s CEO Jensen Huang stated in a speech: “DPU will become one of the three pillars of future computing, and the standard configuration for future data centers will be ‘CPU + DPU + GPU.’ CPU is for general computing, GPU is for accelerated computing, and DPU handles data processing.”

02. What Tasks Can DPU Perform?

In traditional computer architecture, networks generally only serve the purpose of data transmission, with computation centered around CPUs or GPUs. However, when large and complex models like ChatGPT and BERT distribute their workloads across numerous GPUs for parallel computation, they generate a large amount of burst gradient data transmission, which can easily lead to network congestion.

This is a problem with traditional computers, which are centered around comprehensive computation. However, in the AI era of accelerated computing, neither increasing bandwidth nor reducing latency can resolve this network issue.

Understanding DPU in InfiniBand: The Future of AI Computing Hardware

This is where the DPU comes in. Simply put, the DPU allows the network to not only provide data transmission capabilities but also undertake some data processing computations.

Through this new architectural approach, CPUs or GPUs can focus on their specialized computation tasks, while basic infrastructure operational workloads are assigned to the DPU, thereby solving bottlenecks or packet loss issues in network transmission. This approach can reduce network latency by more than ten times.

Understanding DPU in InfiniBand: The Future of AI Computing Hardware

The work that DPU takes on can be summarized in four keywords: virtualization, networking, storage, and security. The DPU is closely associated with data centers, meaning it is primarily used in large-scale computing scenarios rather than personal desktops, laptops, or mobile phones (at least for now). The DPU serves cloud computing, mainly aimed at enhancing the efficiency of computing infrastructure in data centers, reducing energy waste, and subsequently lowering costs.

The aforementioned virtualization, networking, storage, and security are very important tasks for data centers and consume a significant amount of computing resources.

03. DPU in NVIDIA’s Product Line

DPU is a new type of programmable multi-core processor and an SoC (System On Chip). It meets industry standards with high computing power and features high-performance network interfaces, enabling fast parsing and processing of data, and efficiently transmitting data to CPUs and GPUs. The key difference between DPU and CPU is that the CPU excels in general computing tasks (capable of handling various tasks, relatively “general”), while the DPU is better suited for foundational application tasks (focused on specific tasks, relatively “specialized”), such as network protocol processing, switching routing calculations, encryption and decryption, and data compression—essentially the “dirty work.” Thus, the DPU acts as a good assistant to the CPU, forming a “triangular relationship” with the CPU and GPU, fundamentally overturning the operational model of data centers.

After acquiring Mellanox, NVIDIA launched the BlueField-2 DPU and BlueField-2X. In April 2021, NVIDIA released the next-generation data processor—BlueField-3 DPU. This is the first DPU designed for AI and accelerated computing, optimized for multi-tenant and cloud-native environments, providing data center-level software-defined and hardware-accelerated networking, storage, security, and management services. It is said that a BlueField-3 DPU can provide data center services equivalent to what 300 x86 cores could achieve. This frees up a significant amount of CPU resources for running critical business applications.

Understanding DPU in InfiniBand: The Future of AI Computing Hardware

This article will focus on AIGC content generation and processing series, where you can find more exciting articles about AIGC content generation and processing in the public account service menu.

Related posts

Leave a Comment Cancel reply