Understanding the Essence of FPGA Chips

1 Before discussing FPGA, let’s pose a question.

What is the essence of a chip?

In my opinion, the essence of a chip is the circuit!

Simply put, digital chips, no matter how complex, are fundamentally combinations of AND, OR, and NOT.

This is one of the simplest chips available on certain platforms, the 74LS series, which is very cheap, around 20 cents;

Its function is a two-input NAND gate.This is the simplest chip, and its circuit and layout are as follows

In the image above, there are a total of 4 two-input NAND gates.

In contrast, large chips like CPUs or GPUs have tens of millions or even billions of gates.

However, if we delve into the lower levels of these large chips, we will find that they are also composed of AND, NAND, OR, NOR, and other logic gates.

This is the circuit; CPUs and GPUs are also forms of circuit organization.

No matter how complex a chip is, it is described by chip design engineers using Hardware Description Language (HDL).

It may seem similar to software engineers who are coding, but in reality, they are constructing circuits.

EDA tools convert the language into circuits, ultimately resulting in this layout (GDS).Then this version is submitted to manufacturers for production.The foundry converts GDS into silicon.The packaging manufacturer completes the packaging from silicon (DIE) to CHIP.

This process is very similar to designing PCB circuits.

Both are circuits transformed into PCBs for manufacturing.

The so-called “bottleneck” we often mention currently seems to be mainly in the manufacturing stage, which is from layout to silicon.There are other issues, but that’s a topic for another time.

Today, let’s mainly discuss what the biggest problem in designing a chip is?

There are two points that everyone agrees on.

First, the R&D iteration cycle is long:

It is common for large chips to take one or two years to develop. Due to functional defects or market changes, they may ultimately not be sold, requiring a complete redo. This is terrifying; for small chip companies, if a chip fails, they can easily run out of money.

Second, high chip investment:

Chip R&D includes foundry costs, IP costs, labor costs, etc. The MASK for 28nm is around 10 million, while for 12nm it approaches 25 million. Besides that, there are also labor and IP costs. The total investment cost for developing a 28nm chip can be tens of millions, while 7nm and 5nm can reach hundreds of millions. Such high costs (NRE) must be amortized across each chip produced.

If a project or demand only has a few thousand or tens of thousands of units, whether it is worth developing a chip is a significant question.

If foundry is not suitable, is there an alternative solution? A shorter iteration cycle with lower costs to meet this circuit demand.

FPGA, here it comes!

1: FPGA: The Circuit of Circuits

FPGA

Field Programmable Gate Array.

FPGA is different from application-specific chips.

It can be programmed.

As mentioned at the beginning of the article, the essence of a chip is the circuit.

So what is the essence of FPGA?

The essence of FPGA is also a type of chip.

But it can also realize digital circuit functions, such as CPU, GPU, NPU, etc., all of which can be implemented internally in FPGA, efficiency aside.。

Thus, the essence of FPGA is that it can realize the circuit of circuits through programming.

How is this achieved?

Or rather, what kind of circuit can realize basic operations like AND, OR, and NOT?

Let’s take the circuit F=A&B&C&D as an example:

A 16×1 RAM, where every bit can be programmed to 0 or 1.

This RAM has 4-bit addresses, DCBA. By using these 4 bits to select the RAM output.

By configuring different values in the RAM, we can realize the relationship between output F and inputs A, B, C, D.

In the image above, we configured the 16-bit RAM to 0000000000000001, making this circuit equivalent to F=A&B&C&D;

Only when A=B=C=D=1, will F=1, otherwise, F=0;

This perfectly realizes F=A&B&C&D;

Let’s emphasize this important point again;

When the 16-bit RAM is configured to 0000000000000001, it is equivalent to F=A&B&C&D;

Now, the string “0000000000000001” is the programming of FPGA.

This is the fundamental principle of FPGA;

By analogy, how to implement the circuit F=A|B|C|D.

How to program this circuit:

When the 16-bit RAM is configured to 01111111111111111, it is equivalent to F=A|B|C|D.

Everyone can try: by configuring the values of the 16-bit RAM, you can realize any logical operation of inputs A, B, C, D.

FPGA leverages this transformation, possessing the ability to describe any circuit.

The structure in the image above has a special term in FPGA called LUT, lookup table.

LUT forms the most basic unit of all FPGAs.

Since LUT can only implement digital combinational logic, a register flip-flop (ff) is added to achieve data latching;

As shown in the figure below: LUT + register constitutes the basic structure of modern FPGA.

The basic structure of FPGA relies on such simple circuits to achieve incredibly complex logic.

This includes the basic structure of LUT and FF, which combine to form a basic logic unit (LOGIC BLOCK).

A LUT that can perform calculations with inputs ABCD is called a 4-input LUT, and there are also 5-input, 6-input, and other variants.

Changing endlessly, but the essence remains the same.

This structure has not changed much since the inception of FPGA.

This circuit can also be viewed as the smallest FPGA.

Currently, a functional chip can have anywhere from tens of thousands to hundreds of millions of gates.

Relying solely on this circuit to achieve that would be a joke.

Thus, countless LUTs and FFs are needed.

FPGA realizes an array of Logic Blocks (each Logic Block consists of LUT + FF), connected using wiring resources.

Combining interconnections and logic units forms an FPGA chip, as shown in the figure below.

A typical FPGA development process is as follows. From HDL (Verilog hardware description language) to configuration file bitstream.

In comparison, the development process for application-specific chips goes from HDL (Verilog hardware description language) to silicon. This process takes much longer.

What does this bitstream include?

As mentioned at the beginning, when the 16-bit RAM (LUT) is configured to 0000000000000001,

it is equivalent to F=A&B&C&D;

The bitstream generated by FPGA ultimately includes the configuration files for the LUT and the wiring resources.

At this point, the design and programming of FPGA are complete.

Simple and clear!

2: EDA Tools: The Gap from Knowledge to Action

It seems that designing an FPGA chip is not complicated.

The circuits are not complex, but there are not many high-performance FPGAs available on the market.

From knowledge to action, there is a huge gap here,

According to the principle mentioned at the beginning of this article, if a manufacturer completes the design of an FPGA chip.

When it is time for customers to use it, a major problem arises.

EDA tools.

If providing FPGA chips to customers, it is necessary to also provide them with an EDA tool.

Without EDA tools, would it be reasonable to expect customers to manually generate the bitstream files for FPGA?

The chips have been made, but the EDA tools, are they that easy?

Yes, it’s really difficult!

This is a huge pit.

The following image shows an open-source FPGA design process (OpenFPGA), let’s take a look; even an open-source project involves a minimum set of EDA tools.

The EDA tools required for customers include:

1: Synthesis tools

2: Layout and routing tools

3: Bit generation tools

4: Timing analysis tools

5: Simulation tools

6: Embedded logic analyzers and other debugging tools

7: Power analysis tools

These are still considered the minimum set, but it is already much more complicated than the GCC compilation tools provided to users for CPU chips.

I have installed EDA tools from certain companies, and they are all several gigabytes in size, larger than a Windows installation disk.

If the difficulty of CPU’s GCC tools is 1, the difficulty of FPGA’s EDA tools is between 10 and 100.

Borrowing a line from “Let the Bullets Fly”:

When a project succeeds, the chip only accounts for 70% of the credit.

Seventy percent is for EDA, and the chip only gets thirty percent.

Even that thirty percent still has to depend on EDA.

After working hard for a while to produce the chip, one still has to rely on the EDA tools.

Let’s see what “faces” EDA can show?

Taking the layout and routing tool in the image above as an example, it looks quite complex.

If not done well, the utilization of the entire FPGA will be extremely low, and the connections will be poor. Isn’t that important?

In addition to traditional EDA tools for implementing HDL, such as Verilog programming, there are also HLS (High-Level Synthesis) languages.

The introduction of HLS allows software engineers to participate in FPGA design, shielding the circuit development, and directly developing software, which is a testament to the capability of EDA tools.

However, essentially, this involves converting high-level languages (C, C++) into hardware description languages (HDL), and then through synthesis tools into circuits.

The advantage is that it is more aligned with software engineers’ habits, but the downside is that an additional layer of conversion brings efficiency loss.

3: FPGA Architecture: Fusion and Beyond

The article started by stating that FPGA consists of Logic Blocks, primarily (LUT+FF), along with wiring resources.

In addition to these.

FPGA also has many hard IPs, also known as macro units.

For example, PLL, SERDES, RAM, and other conventional IPs.

With the evolution of chips today.

FPGA has also integrated many new elements.

Among these new elements, a notable feature is the integration of hard-core CPU systems within FPGA, even capable of running OS (operating systems).

Thus, CPU + FPGA.

CPU software programming, FPGA hardware circuit programming,

this is the fusion of two swords, surpassing the rest.

Similarly, SERDES is also a necessary module for high-end FPGAs. Without SERDES, FPGA is isolated,

unable to achieve high-speed interconnections with other chip circuits.

Now, connections with external devices support high-speed protocols such as PCIe, SATA, 10G/100G ethernet.

Digital signal processing, FPGA is applied in computationally intensive tasks like radar, etc.

Therefore, FPGA also integrates many DSP units to perform operations like multiplication.

To seize a share in the era of artificial intelligence, some FPGAs have integrated hard cores for neural network acceleration.

To summarize, besides LUT, the hard-core IPs integrated within FPGA include:

1: RAM: for implementing storage resources;

2: PLL: providing high-speed clock signals and resources;

3: DSP: multiplication operations, filters, digital signal processing modules;

4: SERDES: enabling high-speed interfaces like PCIE, SATA, FC, 100G ethernet;

5: CPU systems: providing software programming capabilities;

6: NPU cores: providing AI processing acceleration;

And depending on market needs, more hard-core IPs may be added to traditional FPGA for fusion.

Integrating more functional IPs.

This is the trend for FPGA architecture in the future.

4: Advantages of FPGA: Solving Problems is the Key

What are the differences and advantages of FPGA compared to CPU?

When discussing the pros and cons of an architecture, the focus is on what problems it solves.

Not whether CPU or FPGA is superior.

Only when placed in specific application scenarios can we determine which architecture is more suitable for solving these problems.

FPGA programming is circuit-based, essentially still considered as logical gates (AND, OR, NOT) and equivalent circuits.

CPU refers to instruction operations that run software.

FPGA operates in parallel over time, while CPUoperates serially, with a single CPU core always needing to execute instructions one by one to achieve functionality (there is instruction-level parallelism, but the principle remains unchanged).

FPGA has a higher degree of parallelism; compared to CPU’s computing method, it has a larger data throughput and better latency control.

However, the CPU has a very high frequency and can run operating systems, making it a highly flexible general computing unit, which FPGA cannot do better.

FPGA is more suitable for ASSP, used alongside CPU as a specialized computational unit, ideal for large-volume data computations.

Since FPGA is the “universal chip,” it seems it can replace all chip functions.

Can we not develop application-specific chips and only use FPGA?

Compared to application-specific integrated circuits: FPGA has three disadvantages.

1: Large area, high cost:

Compared to application-specific integrated circuits (chips), FPGA uses LUTs to represent basic logic gate units, so its area is roughly ten times that of dedicated circuits, thus its cost is also much higher.

2: Large area, high power consumption:

Similarly, it does not have advantages in power consumption; for low-power devices such as handheld powered devices, it is generally not feasible.

3: Low operating frequency, low computational efficiency.

Due to the longer interconnections between LUTs inside FPGA, the frequency of FPGA is much lower than that of ASICs under the same process. The delay between gates in dedicated circuits is much smaller.

However, FPGA also has clear advantages.

The biggest advantage of FPGA is its high flexibility.

Using FPGA does not require re-fabrication, saving NRE costs.

In fields such as radar, 5G, networking, storage, and high-performance computing, FPGA has widespread applications.

Especially when the demand is unclear, the quantity is small, and it is not worth making chips, or when the demand keeps changing, FPGA is a good choice.

After Intel acquired Altera, the focus of FPGA shifted to the data center market, particularly investing heavily in using FPGA for acceleration in data centers. The latest Intel IPU incorporates FPGA technology for offloading data center infrastructure, which is also a realization similar to DPU (see the previous article: Talking about DPU – From Networking to Data).

Currently, many DPU are implemented using Intel and Xilinx FPGAs.

This reflects the fluctuating demand for DPU in data centers.

The global FPGA market was approximately $6 billion in 2018, with Xilinx and Altera being the most important providers in this market, while others are relatively smaller companies.（In 2015, Intel announced the acquisition of FPGA manufacturer Altera for $16.7 billion, leaving little market share for others.

I have used both of these FPGA brands in the past, and each has its strengths.

Many domestic FPGA companies have emerged, some with decent shipments. Compared to foreign giants, they are at the stage of solving the problem of existence and can meet some domestic demand. However, there are significant gaps in capacity, performance, and especially in EDA tools, and they need to continue to hone their market skills.

Here are two non-typical examples of domestic FPGAs, which are quite interesting.

In 2014, after Russia annexed Crimea, the U.S. imposed sanctions and restrictions on Russia, causing a chip shortage, especially for high-end FPGA chips. A certain FPGA company seized this opportunity to resolve the urgent needs of international friends, achieving substantial profits from FPGA exports.

Additionally, a certain smartphone manufacturer had a special transcoding format for phone screens, so if repairs were needed, only original factory screens could be used. Other screen formats wouldn’t match, and original factory screens were very expensive. A certain company customized a batch of ultra-small FPGAs to achieve decoding of phone screens, allowing many Chinese screen manufacturers (LCD, LED, etc.) to directly replace original factory screens, which also sold very well in the smartphone repair market.

In these niche areas, they found their positioning and achieved significant breakthroughs, both in profit and quantity.

If there are quantities of tens of thousands, hundreds of thousands, or even more, developing application-specific chips is more suitable.

If there aren’t that many, but demand is unstable, FPGA is more appropriate.

In chip design, there is a stage called FPGA prototype verification, which involves implementing the chip code in FPGA for prototype realization, accelerating the iteration speed of chip design.

The beginning of this article states: the characteristic of FPGA is its ability to describe chip circuits.

Thus, before the foundry of digital chips, using FPGA to load the chip logic for equivalent testing is also a very important step.

From this perspective, FPGA and chips (application-specific integrated circuits) are never oppositional;

FPGA, the universal chip, is functionally universal,theoretically capable of realizing all functions.

However, from the perspectives of PPA (Performance, Power, Area), these three dimensions are significantly limited.

FPGA is also evolving, expanding into more fields to meet those changing market demands.

Some market demands are more solidified, replaced by application-specific chips.

The universal chip, the changing application.

Finding market positioning and solving user problems is the key to gaining a foothold.

FPGA is like this, and chips are too.

END

IIC Shanghai 2023

Understanding the Essence of FPGA Chips

Related posts

Leave a Comment Cancel reply