Understanding NPU: The Future of AI Processing

What does “NPU” stand for? What can it do? Here’s what you need to know about this emerging technology.
In the past year, there has been increasing discussion about Neural Processing Units (NPUs). Although NPUs have appeared in smartphones for several years, Intel, AMD, and more recently Microsoft have launched consumer laptops and PCs equipped with NPUs that support AI.
NPUs are closely related to the concept of AI PCs, with an increasing number of chips produced by major hardware manufacturers like AMD, Apple, Intel, and Qualcomm incorporating NPUs. Since Microsoft launched its Copilot+ AI PC products earlier this year, NPUs have started to appear more frequently in laptops.

What Role Does NPU Play?

The role of the NPU is to act as a hardware accelerator for artificial intelligence. Hardware acceleration involves using dedicated silicon chips to manage specific tasks, much like a chef delegates different tasks to sous chefs to prepare meals on time. NPUs do not replace your CPU or GPU; rather, they are designed to complement the strengths of CPUs and GPUs by handling workloads such as edge AI, allowing CPUs and GPUs to reserve processing time for tasks they excel at.
GPUs are hardware accelerators specifically designed for rendering graphics, but they also possess enough underlying flexibility to be well-suited for AI or certain types of scientific calculations. For a long time, if you had AI workloads to process, you would want to use one or more high-performance [possibly Nvidia?] GPUs for the actual numerical computations. Some companies are working on building dedicated hardware accelerators for AI, such as Google’s TPU, because the additional graphical capabilities contained in “GPU” are not useful in cards that are purely for AI processing.

Workload Determines Everything

Hardware acceleration is most useful for repetitive tasks that do not involve a lot of conditional branching, especially when dealing with large volumes of data. For example, rendering 3D graphics requires a computer to manage an ongoing stream of countless particles and polygons. This is a bandwidth-intensive task, but the actual computation is primarily trigonometric functions. Computer graphics, physics, astronomical calculations, and large language models (LLMs) like those that support modern AI chatbots are a few examples of ideal workloads for hardware acceleration.
There are two types of AI workloads: training and inference. Training is almost entirely conducted on GPUs. Nvidia has dominated both markets thanks to nearly twenty years of investment in CUDA and its leadership in standalone GPUs, although AMD has lagged far behind in second place. Large-scale training is performed at data center scale, and so is the inference workload that runs when you interact with cloud-based services like ChatGPT.
NPUs (and the AI PCs connected to them) operate on a much smaller scale. They can complement the integrated GPUs in your favorite CPU vendor’s microprocessors, providing additional flexibility for future AI workloads and improving performance compared to waiting for the cloud.

How Does NPU Work?

Generally speaking, NPUs rely on a highly parallel design to quickly execute repetitive tasks. In contrast, CPUs are generalists. This difference is reflected in the logic and physical architecture of NPUs. A CPU has one or more cores that can access a small amount of shared memory cache, while an NPU has multiple sub-units, each with its own mini-cache. NPUs are suited for high-throughput and highly parallel workloads like neural networks and machine learning.
NPUs, neural networks, and neuromorphic systems (like Intel’s Loihi platform) share a common design goal: to simulate certain aspects of brain information processing.
Understanding NPU: The Future of AI Processing
Each device manufacturer that brings an NPU to market has its own microarchitecture specific to its products. Most manufacturers also release software development tools to be used with their NPUs. For example, AMD provides the Ryzen AI software stack, while Intel continues to improve its ongoing open-source deep learning software toolkit OpenVINO.

NPU and Edge Intelligence

Most NPUs are installed in consumer-facing devices such as laptops and PCs. For example, Qualcomm’s Hexagon DSP adds NPU acceleration to its Snapdragon processors for smartphones, tablets, wearables, advanced driver assistance systems, and the Internet of Things. The Apple ecosystem uses its Neural Engine NPU in the A-series and M-series chips in iPhones, iPads, and iMacs. Additionally, some PCs and laptops are designated as Copilot+, meaning they can run Microsoft’s Copilot AI on the onboard NPU. However, some server-side or cloud-based systems also use NPUs. Google’s Tensor Processing Units are NPU accelerators designed specifically for high-performance machine learning in data centers.
One reason for the rise of NPUs is the increasing importance of edge intelligence. Between sensor networks, mobile devices (like smartphones and laptops), and the Internet of Things, the demand for data processing has grown. Meanwhile, cloud-based services are hampered by infrastructure latency. Local processing does not necessarily have to perform any operations in the cloud. This can be an advantage in terms of speed and security.
The question of whether you need an NPU is almost a distraction. Silicon Valley giants like Intel, AMD, and Apple have invested in this technology. Whether or not you have a specific use for an NPU, the chip you choose the next time you assemble or purchase a PC is likely to come equipped with one or more NPUs. By the end of 2026, analysts expect that 100% of enterprise PC purchases in the U.S. will have one or more NPUs embedded in the chips. In other words, you don’t need to worry about not being able to buy a system with an NPU; they will be actively seeking you out.

Original link: https://www.extremetech.com/computing/what-is-an-npu

Source | Semiconductor Industry Observation, translated from extremetech
Recommended Reading —
The U.S. adds 13 Chinese companies to the “Unverified List” (with Chinese list attached)
Japan’s sudden update on export controls! Includes 5 semiconductor products
Understanding the technical characteristics of computing core HBM
The U.S. upgrades AI chip export ban, 13 Chinese GPU companies listed in the entity list (with list attached)
Here it comes again! Foreign media: The U.S. Commerce Department will impose export controls on 42 Chinese companies (with Chinese list attached)
The U.S. Commerce Department adds 11 Chinese entities to the entity list (with list attached)
Europe and the U.S. worry about China’s accelerated production of traditional chips! Experts: The U.S. government lacks a clear strategy on the chip issue with China
Japan’s export control on advanced semiconductors takes effect from July 23, affecting 23 types of manufacturing equipment (with details attached)
Involving nearly 20 Chinese companies! The U.S. Congress investigates four U.S. venture capital firms’ investments in China (with list attached)
The U.S. announces sanctions against 13 Chinese entities and individuals, and our embassy in the U.S. responds (with sanctions target list attached)
[Semiconductors] The U.S. adds 12 Chinese companies to the export control “entity list” citing links to the Russian military
Understanding NPU: The Future of AI Processing
☞ Business Cooperation: ☏ Please call 010-82306118 / ✐ Or email [email protected]
Understanding NPU: The Future of AI Processing
Understanding NPU: The Future of AI Processing
Click here “Read Original” to reach the official website of Electronic Technology Applications

Leave a Comment