Understanding ASIC and FPGA: Key Differences and Applications

In the previous article, we introduced CPUs and GPUs. Today, I will continue to introduce two other main players in the field of computing chips – ASIC and FPGA.

█ ASIC (Application Specific Integrated Circuit)

As mentioned earlier, GPUs have strong parallel computing capabilities, but they also have drawbacks, such as high power consumption, large size, and high cost.

Entering the 21st century, the demand for computing power shows two significant trends: First, the usage scenarios of computing power are becoming more segmented; second, users’ requirements for computing performance are increasingly high. General-purpose computing chips can no longer meet the needs of users.

As a result, more and more companies are strengthening their research and investment in dedicated computing chips. ASIC (Application Specific Integrated Circuit) is a type of chip specifically designed for certain tasks.

Understanding ASIC and FPGA: Key Differences and Applications

The official definition of ASIC is: integrated circuits specifically designed and manufactured to meet the needs of specific users or specific electronic systems.

ASIC started in the 1970s and 1980s. In the early days, it was used in computers. Later, it was mainly used for embedded control.In recent years, as mentioned earlier, it has begun to rise for AI inference, high-speed search, and visual and image processing.

When talking about ASIC, we must mention Google’s famous TPU.

TPU, short for Tensor Processing Unit, is a tensor processing unit. A tensor is a mathematical entity that contains multiple numbers (multi-dimensional arrays).

Currently, almost all machine learning systems use tensors as the basic data structure. Therefore, we can simply understand tensor processing units as “AI processing units”.

In 2015, to better complete its deep learning tasks and enhance AI computing power, Google launched a chip specifically for neural network training, namely TPU v1.

Compared to traditional CPUs and GPUs, TPU v1 can achieve a performance improvement of 15 to 30 times in neural network computing, with energy efficiency improvements reaching 30 to 80 times, bringing great shock to the industry.

In 2017 and 2018, Google continued to launch stronger TPU v2 and TPU v3 for AI training and inference. In 2021, they launched TPU v4, which uses 7nm technology, with the number of transistors reaching 22 billion, achieving a 10-fold performance improvement over the previous generation, 1.7 times stronger than NVIDIA’s A100.

In addition to Google, many large companies have also been working on ASICs in recent years.

Intel acquired Israeli AI chip company Habana Labs at the end of 2019, and in 2022, it released the Gaudi 2 ASIC chip. IBM Research Institute also released the AI ASIC chip AIU at the end of 2022.

Samsung also worked on ASICs a few years ago, making chips specifically for mining machines. Indeed, many people recognize ASICs from Bitcoin mining. Compared to GPU and CPU mining, ASIC mining machines are more efficient and consume less energy.

ASIC mining machines

In addition to TPU and mining machines, two other well-known types of ASIC chips are DPU and NPU.

DPU stands for Data Processing Unit, mainly used in data centers. I previously introduced it, you can check here: What exactly is the DPU that has become popular all over the internet?

NPU, known as Neural Processing Unit, simulates human neurons and synapses at the circuit level and processes data with deep learning instruction sets.

NPU is specifically used for neural network inference, enabling efficient operations like convolution and pooling. It is often integrated into some mobile chips.

Speaking of mobile chips, it is worth mentioning that the main chip in our mobile phones, commonly referred to as SoC chips, is also a type of ASIC chip.

Mobile SoC chips

What are the advantages of ASIC as a specialized custom chip? Is it just exclusive to enterprises, with special logos and names?

No.

Customization means tailoring to specific needs. Based on the specific tasks the chip is aimed at, the computing power and efficiency of the chip are strictly matched to the task algorithms. The number of cores in the chip, the ratio of logical computing units and control units, as well as cache, the entire chip architecture is precisely customized.

Therefore, custom ASICs can achieve extreme size and power consumption. The reliability, confidentiality, computing power, and energy efficiency of these chips are all stronger than general-purpose chips (CPU, GPU).

You will find that the ASIC companies we mentioned earlier are all large companies like Google, Intel, IBM, and Samsung.

This is because customizing chip design requires a very high level of R&D technology for a company and is extremely costly.

To create an ASIC chip, one must first go through a complex design process including code design, synthesis, and backend, followed by several months of production and packaging testing before obtaining the chip to build the system.

Everyone has heard of “tape-out”. Like an assembly line, manufacturing chips through a series of process steps is tape-out. In simple terms, it is trial production.

The R&D process of ASIC requires tape-out. A 14nm process can cost around $3 million for tape-out. A 5nm process can be as high as $47.25 million.

If the tape-out fails, all the money is wasted, and it also delays a lot of time and effort. Generally, small companies cannot afford this.

So, does this mean that small companies cannot customize chips?

Of course not. Next, it’s time for another magical tool to shine – FPGA.

█ FPGA (Field Programmable Gate Array)

FPGA, short for Field Programmable Gate Array, is a reconfigurable chip.

FPGA has been very popular in the industry in recent years, even being called the “universal chip”.

In simple terms, FPGA is a chip that can be reconfigured. It can be repeatedly programmed an unlimited number of times after manufacturing to achieve the desired digital logic function.

The reason FPGA can achieve DIY is due to its unique architecture.

FPGA is composed of programmable logic blocks (Configurable Logic Blocks, CLB), input/output modules (I/O Blocks, IOB), programmable interconnect resources (Programmable Interconnect Resources, PIR), and static memory SRAM.

CLB is the most important part of FPGA, serving as the basic unit for implementing logic functions and carrying the main circuit functions.

They are usually arranged in a regular array (Logic Cell Array, LCA) spread throughout the chip.

IOB mainly interfaces the logic on the chip with external pins, usually arranged around the chip.

PIR provides rich wiring resources, including grid connections, programmable switch matrices, and programmable connection points. They serve to connect and form specific functional circuits.

Static memory SRAM is used to store programming data for internal IOB, CLB, and PIR, controlling them to complete system logic functions.

CLB itself is mainly composed of Look-Up Tables (LUT), multiplexers, and flip-flops. They are used to carry individual logic gates in the circuit and can be used to implement complex logic functions.

In simple terms, we can understand LUT as RAM that stores computation results. When a user describes a logic circuit, the software calculates all possible results and writes them into this RAM. Each signal performing logical operations equals inputting an address to look up a table. The LUT finds the content corresponding to the address and returns the result.

This “hardware-based” computation method obviously has a faster computation speed.

When using FPGA, users can complete circuit design using hardware description languages (Verilog or VHDL) and then “program” (burn) the FPGA, loading the design onto the FPGA to achieve the corresponding function.

Upon powering on, the FPGA reads data from the EPROM (Erasable Programmable Read-Only Memory) into SRAM. After configuration, the FPGA enters operational status. After power loss, the FPGA returns to a blank state, and the internal logic disappears. This cycle achieves “on-site” customization.

FPGA is very powerful. Theoretically, if the scale of gates provided by FPGA is large enough, it can implement any ASIC logic function through programming.

FPGA development kits, the middle one is the FPGA chip

Now let’s take a look at the development history of FPGA.

FPGA is a product developed based on programmable devices such as PAL (Programmable Array Logic) and GAL (Generic Array Logic), belonging to a type of semi-custom circuit.

It was born in 1985, invented by Xilinx. Later, companies such as Altera, Lattice, and Microsemi also entered the FPGA field, eventually forming a four-giant structure.

In May 2015, Intel acquired Altera for a staggering $16.7 billion, later integrating it into the PSG (Programmable Solutions Group) department.

In 2020, Intel’s competitor AMD also made a move, acquiring Xilinx for $35 billion.

Thus, we have Xilinx (under AMD), Intel, Lattice, and Microsemi as the four giants.

In 2021, their market shares were 51%, 29%, 7%, and 6% respectively, accounting for 93% of the global total.

Recently, in October 2023, Intel announced plans to split the PSG department for independent business operations.

As for domestic FPGA manufacturers, there are Fudan Microelectronics, Unisoc, Anlu Technology, Dongtu Technology, Gaoyun Semiconductor, Jingwei Qili, Jingwei Yage, Zhiduo Crystal, Aoge Core, and so on. It seems there are quite a few, but in reality, the technology gap is significant.

█ Differences between ASIC and FPGA

Next, we will focus on the differences between ASIC and FPGA, as well as their differences from CPU and GPU.

ASIC and FPGA are essentially chips. ASIC is a fully customized chip with fixed functions that cannot be modified, while FPGA is a semi-custom chip with flexible functions and high playability.

We can illustrate the difference between the two with an example.

ASIC is like using molds to make toys. Molding must be done in advance, which is quite labor-intensive. Moreover, once the mold is opened, it cannot be modified. If you want to make a new toy, you must open a new mold.

FPGA, on the other hand, is like using LEGO blocks to build toys. You can start building easily and, with a little time, have it ready. If you are not satisfied or want to build a new toy, you can take it apart and rebuild it.

Many design tools for ASIC and FPGA are the same. In terms of the design process, FPGA is less complex than ASIC, eliminating some manufacturing processes and additional design verification steps, resulting in a process that is about 50%-70% of that of ASIC. The most cumbersome tape-out process is not required for FPGA.

This means that developing an ASIC may take several months or even over a year, while FPGA only takes a few weeks or months.

As mentioned, FPGA does not require tape-out, so does that mean FPGA’s cost is always lower than ASIC?

Not necessarily.

FPGA can be prefabricated and programmed in the lab or on-site, without one-time engineering costs (NRE). However, as a “general-purpose” toy, its cost is ten times that of ASIC (“molded toy”).

If production volume is low, then FPGA will be cheaper. If production volume is high, the one-time engineering costs of ASIC will be amortized, making ASIC cheaper.

It’s like the cost of opening a mold. Opening a mold is expensive, but if the sales volume is large, it becomes worthwhile.

As shown in the figure below, 40W chips represent a dividing line for the costs of ASIC and FPGA. For production volumes of less than 40W, FPGA is cheaper. For more than 40W, ASIC is cheaper.

From the perspective of performance and power consumption, as a dedicated custom chip, ASIC is stronger than FPGA.

FPGA is a general-purpose editable chip with more redundant functions. Regardless of how you design it, there will always be some extra components.

As mentioned earlier, ASIC is tailored with no waste, using hard wiring. Thus, it has stronger performance and lower power consumption.

FPGA and ASIC do not simply compete or replace each other; their positioning is different.

FPGA is now mostly used for product prototype development, design iteration, and some specific applications with low production volumes. It is suitable for products that require short development cycles. FPGA is also frequently used for ASIC verification.

ASIC is used for designing large-scale, high-complexity chips or mature products with comparatively high production volumes.

FPGA is particularly suitable for beginners to learn and participate in competitions. Many universities’ electronics programs now use FPGA for teaching.

From a commercialization perspective, the main application fields for FPGA are communication, defense, aerospace, data centers, medical, automotive, and consumer electronics.

FPGA has been used early in the communication field. Many base station processing chips (baseband processing, beamforming, antenna transceivers, etc.) use FPGA. Core network coding and protocol acceleration also utilize it. Data centers previously used it in components like DPU.

Later, as many technologies matured and stabilized, communication equipment manufacturers began replacing FPGA with ASIC to reduce costs.

It is worth mentioning that the recently popular Open RAN actually uses general-purpose processors (Intel CPU) for computation. This solution’s energy consumption is far inferior to that of FPGA and ASIC. This is one of the main reasons why equipment manufacturers, including Huawei, are reluctant to follow Open RAN.

In the automotive and industrial fields, FPGA is mainly favored for its latency advantages, so it is used in ADAS (Advanced Driver Assistance Systems) and servo motor drives.

FPGA is used in consumer electronics because product iterations are too fast. The development cycle of ASIC is too long, and by the time it is completed, it is already outdated.

█ FPGA, ASIC, GPU, which is the most suitable AI chip?

Finally, let’s return to the topic of AI chips.

In the previous issue, I buried a point that AI computation is divided into training and inference. Training is where GPUs are in an absolute leading position, while inference is not. I didn’t explain the reason.

Now, I will explain.

First, remember that purely from a theoretical and architectural perspective, the performance and cost of ASIC and FPGA are definitely superior to CPU and GPU.

CPUs and GPUs follow the von Neumann architecture, where instructions must go through storage, decoding, execution, etc., and shared memory must undergo arbitration and caching during use.

FPGA and ASIC, on the other hand, do not follow the von Neumann architecture (they follow Harvard architecture). Taking FPGA as an example, it is essentially a non-instruction, memory-sharing-free architecture.

FPGA’s logic unit functions are determined during programming, meaning it uses hardware to implement software algorithms. For state-saving needs, the registers and on-chip memory (BRAM) in FPGA belong to their respective control logic, eliminating the need for arbitration and caching.

In terms of the proportion of ALU operation units, GPUs have a higher proportion than CPUs, while FPGAs, with almost no control modules, have an even higher proportion of ALU operation units than GPUs.

Therefore, overall, the computation speed of FPGA is faster than that of GPUs.

Now let’s look at power consumption.

GPUs are notoriously high in power consumption, with a single chip reaching 250W, even 450W (RTX4090). In contrast, FPGAs generally consume only 30-50W.

This is mainly due to memory reading. The memory interfaces of GPUs (GDDR5, HBM, HBM2) have extremely high bandwidth, about 4-5 times that of traditional DDR interfaces for FPGA. However, in terms of the chip itself, the energy consumed in reading DRAM is over 100 times that of SRAM.

Additionally, the operating frequency of FPGA (below 500MHz) is lower than that of CPUs and GPUs (1-3GHz), which also contributes to lower power consumption.

FPGA’s low operating frequency is mainly limited by wiring resources. Some lines need to be routed far, and if the clock frequency is too high, it cannot keep up.

Finally, let’s look at latency.

The latency of GPUs is higher than that of FPGAs.

GPUs typically need to divide different training samples into fixed-size “batches” and to maximize parallelism, several batches must be gathered before processing.

FPGA’s architecture is batch-less. After processing a data packet, it can output immediately, providing a latency advantage.

So, the question arises: Why, when GPUs are inferior to FPGA and ASIC in many aspects, have they become the current hot topic in AI computation?

It’s simple: In the extreme pursuit of computing power performance and scale, the entire industry does not care about cost and power consumption.

With NVIDIA’s long-term efforts, the core count and working frequency of GPUs have been continuously improved, and the chip area has also been increasing, which is a brute-force approach to computing power. Power consumption relies on process technology and passive cooling like water cooling, as long as it doesn’t catch fire.

Besides hardware, in the previous article, I also mentioned that NVIDIA has been very strategic in software and ecosystem development.

Their development of CUDA has become a core competitive advantage for GPUs. Based on CUDA, beginners can quickly get started with GPU development. They have cultivated a large user base over many years.

In contrast, FPGA and ASIC development remains too complex and unsuitable for widespread adoption.

In terms of interfaces, while GPUs have relatively simple interfaces (mainly PCIe), which are less flexible than FPGA (FPGA’s programmability allows for easy integration with any standard and non-standard interfaces), they are sufficient for servers, allowing for plug-and-play.

In addition to FPGA, ASIC has also struggled to compete with GPUs in AI due to its high costs, long development cycles, and significant development risks. AI algorithms are changing rapidly, and the development cycle of ASIC is a critical issue.

Considering all these reasons, GPUs have the current favorable situation.

In AI training, GPUs provide powerful computing capabilities, significantly enhancing efficiency.

In AI inference, the input is generally a single object (image), so the requirements are lower, and there is no need for much parallelism, leading many companies to adopt cheaper and more energy-efficient FPGAs or ASICs for computation.

Other computing scenarios are similar. For those who prioritize absolute computing performance, GPUs are the first choice. For those with less stringent performance requirements, FPGA or ASIC can be considered to save costs.

█ Final Words

The knowledge about CPUs, GPUs, FPGAs, and ASICs ends here.

They are typical representatives of computing chips. Currently, all computing scenarios are essentially handled by them.

As times change, computing chips have also seen new trends. For example, heterogeneous computing, which combines different computing chips to leverage their advantages. We refer to this approach as heterogeneous computing.Additionally, IBM has been leading the development of brain-like chips, which simulate the processing of the human brain’s synapses and have also achieved breakthroughs, gaining popularity.In the future, I will introduce this topic in detail.

I hope this series of articles on chips by Xiaozhao has been helpful to everyone. If you like it, please follow, share, and like.

Thank you!

—— The End ——

References:

1. “Understanding the Concept and Working Principle of GPU”, Open Source LINUX;

2. “Overview of AI Chip Architecture”, Zhihu, Garvin Li;

3. “What are the Differences Between GPU, FPGA, and ASIC Accelerators?”, Zhihu, Hu Shuo Mantan;

4. “Deep Dive into GPU, FPGA, and ASIC”, Automotive Industry Frontline Observation;

5. “Why GPU is the Core of AI Era Computing Power”, Mu Xi Integrated Circuit;

6. “Overview of Three Mainstream Chip Architectures in Autonomous Driving”, Digital Transformation;

7. “AIGC Computing Power Panorama and Trend Report”, Quantum Bit;

8. Baidu Encyclopedia, Wikipedia.

By the way, a little advertisement:

The Fresh Date Classroom Knowledge Planet is celebrating its first anniversary! In the new year, please continue to support!

Understanding ASIC and FPGA: Key Differences and Applications

Related posts

Leave a Comment Cancel reply