FPGA Chip Design and Key Technologies

This article comes from“FPGA Special: Universal Chip Ignites New Power, Domestic Replacement Future is Promising (2023)”, FPGA also known as Field Programmable Gate Array, is an integrated circuit with programmable characteristics that is pre-designed on a silicon wafer. Users can reconfigure the internal resources of the chip through software to achieve different functions during use. In layman’s terms, FPGA chips are similar to building blocks in integrated circuits, and users can assemble them into different functional and characteristic circuit structures according to their needs and ideas to meet the application requirements of different scenarios. Given the above characteristics, FPGA chips are also known as “universal” chips.

Download link:

“FPGA Q&A Series Collection”

FPGA Special: Universal Chip Ignites New Power, Domestic Replacement Future is Promising (2023)

FPGA chips consist of three parts: programmable logic units (Logic Cell, LC), input-output units (Input Output Block, IO), and switch connection arrays (Switch Box, SB):

(1) Logic units: Realize different circuit functions through binary data stored in the data lookup table (Look-up Table, LUT). The essence of LUT is a type of static random access memory (Static Random Access Memory, SRAM), and its size is determined by the number of input signals. Common lookup table circuits include four-input lookup tables (4-input LUT, LUT4), five-input lookup tables (5-input LUT, LUT5), and six-input lookup tables (6-input LUT, LUT6). The more inputs a lookup table has, the more complex the logic circuits that can be realized, thus increasing the logic capacity. However, the area of the lookup table increases exponentially with the number of inputs; for each additional input, the area of the SRAM storage circuit used by the lookup table approximately doubles. Different logic unit structures can use lookup tables of different sizes or combinations of different types of lookup tables. In addition, logic units also contain other components such as selectors, carry chains, and flip-flops. To improve chip architecture efficiency, several logic units can further form logic blocks (Logic Block), which provide fast local resources internally, thus forming a hierarchical chip architecture.

(2) Input-output units: These are the interface parts of the chip with external circuits, used to meet the driving and matching requirements for input/output signals under different conditions.

(3) Switch array: Can control the direction of signal connections through internal MOS transistors.

FPGA was first launched by Xilinx in 1985 with the world’s first FPGA chip “XC2064”, and has gone through decades of development, roughly experiencing four stages in hardware architecture: from the PROM stage (simple digital logic) to the PAL/GAL stage (“AND” & “OR” arrays), then to the CPLD/FPGA stage (ultra-large scale circuits), and now to the stage where FPGA technology integrates with ASIC technology and develops towards system-level SoC FPGA/eFPGA. The overall trend in hardware level is towards larger scale, higher flexibility, and better performance.

FPGA chips belong to the category of logic chips. Logic chips can be divided into four main categories based on functionality: general-purpose processor chips (including central processing chips CPU, graphics processing chips GPU, digital signal processing chips DSP, etc.), memory chips (Memory), application-specific integrated circuit chips (ASIC), and field-programmable logic array chips (FPGA).

FPGA has two main characteristics: flexibility and parallelism. (1) Flexibility: FPGA chips have higher flexibility and more rich choices. By programming FPGA, users can change the internal connection structure of the chip at any time to realize any logic function. Especially in industries where technology standards are not yet mature or where development speeds are fast, FPGA can effectively help enterprises reduce investment risks and sunk costs, making it a choice that combines functionality and economic benefits.

(2) Parallelism: When CPUs and GPUs execute tasks, the execution units must complete data processing through a series of processes such as instruction fetching, decoding, execution, memory access, and write-back in order. The sharing of memory among multiple parties leads to some tasks requiring access arbitration, resulting in task delays. In contrast, the connection structure between each logic unit and surrounding logic units in FPGA is already determined during reprogramming (burning), and registers and on-chip memory belong to their respective control logic, eliminating the need for communication through instruction decoding or shared memory. Each hardware logic can work in parallel, significantly improving data processing efficiency. Especially when executing high-repetition large data processing tasks, FPGA has obvious advantages over CPU.

Compared to other logic chips, FPGA has a good balance between flexibility, performance, power consumption, and cost:

(1) Compared to GPU, FPGA has advantages in power consumption and flexibility. On one hand, because GPU uses a large number of processing units and accesses external storage SDRAM frequently, its peak computing performance is higher, but its power consumption is also higher. The average power consumption of FPGA (10W) is far lower than that of GPU (200W), effectively improving heat dissipation problems. On the other hand, after the design of GPU is completed, the hardware resources cannot be modified, while FPGA can program the hardware according to specific applications, making it more flexible. Machine learning uses multiple instructions to process single data in parallel, and the customization ability of FPGA better meets the computing needs of low-precision, distributed, unconventional deep neural networks.

(2) Compared to ASIC chips, FPGA has advantages in short cycles and high cost-effectiveness in the early stages of projects. ASIC needs to be designed from standard cells, and when the functional and performance requirements of the chip change or when process advancements occur, ASIC needs to be re-fabricated, leading to high sunk costs and long development cycles. In contrast, FPGA has advantages such as programming, debugging, reprogramming, and repeated operations, allowing for reconfiguration of chip functions. Therefore, early on, FPGA was often used as a semi-custom circuit in the field of customized ASIC and is considered one of the faster paths for prototype construction and design development.

The memory structure in FPGA logic is roughly divided into three levels (taking Intel Agilex-M FPGA as an example), including ultra-localized on-chip memory, local packaged memory provided in the form of HBM2e stacks, and external memory architectures and interfaces such as DDR5 and LPDDR5.

 On-chip memory (MLAB module and M20K module): the most localized memory;

 Packaged memory (HBM): memory that bridges critical gaps in the memory hierarchy, with a capacity far greater than on-chip memory (by more than two orders of magnitude), while its bandwidth is also far greater than off-chip memory (by more than two orders of magnitude);

 Off-chip memory (DDR5, LPDDR5 etc.): For applications that exceed the capacity of HBM2e, or require flexibility in independent memory, DDR5, LPDDR5, and other mainstream memory architectures are needed.

HBM2e integrated with FPGA chips in the same package can achieve higher bandwidth, lower power consumption, and lower latency in a small size specification.

(1) In terms of memory capacity: Each HBM2e stack can contain 4 or 8 layers, each layer providing 2GB of memory, so a single Intel Agilex-M series FPGA can contain 16GB or 32 GB of high-bandwidth memory;

(2) In terms of bandwidth: HBM2e can achieve memory bandwidth of up to 410Gbps per stack, which is up to 18 times higher than that of DDR5 components, and up to 7 times higher than that of GDDR6 components. Two HBM2e stacks can provide peak memory bandwidth of up to 820Gbps.

(3) In terms of power consumption and latency: Since HBM2e is integrated in the package, external I/O pins are not needed, thus saving PCB space and eliminating the power consumption and interconnection latency that they would cause.

The on-chip network (NoC, Network on Chip) refers to the integration of a large number of computing resources on a single chip and the on-chip communication network that connects these resources, used for sharing data among programmable logic (PL), processor systems (PS), and other hard blocks.

The corresponding concept – system on chip (SoC) is a single chip that contains a complete set of diverse and interconnected units, designed to solve a certain range of tasks. Traditionally, SoC includes several computing cores, memory controllers, I/O subsystems, and the connections and switching methods between them (bus, crossbar, NoC components).

The NoC includes both computation and communication subsystems. The computation subsystem (composed of PE, Processing Element) completes the broad definition of “computation” tasks. PE can be existing CPUs, SoCs, various dedicated function IP cores, or memory arrays, reconfigurable hardware, etc. The communication subsystem (composed of Switch) connects PE to achieve high-speed communication between computing resources. The network formed by communication nodes and the interconnecting lines is the on-chip communication network.

Analogous to a city’s highway network, the NoC architecture simplifies interconnection paths and improves the transmission rate of FPGA. Achronix’s Speedster7t FPGA device, based on TSMC’s 7nm FinFET process, includes a 2D NoC architecture, providing ultra-high bandwidth (~27Tbps) for data transmission between external high-speed interfaces and internal programmable logic of FPGA. NoC uses a series of high-speed row and column networks (horizontal and vertical) to distribute data throughout the FPGA, with each row or column having two 256 bit, unidirectional, industry-standard AXI channels, capable of operating at a transmission rate of 512Gbps (256bit x 2GHz).

NoC provides several important advantages for FPGA design, including: (1) improving design performance; (2) reducing idle logic resources, lowering the risk of layout congestion in high-resource designs; (3) reducing power consumption; (4) simplifying logic design, allowing NoC to replace traditional logic for high-speed interfaces and bus management; (5) achieving truly modular design.

Intel (Altera) utilizes the NoC architecture to achieve high-bandwidth data transmission between memory and programmable logic structures. As shown in the figure below, each on-chip HBM2e stack communicates with its NoC through the UIB. Off-chip memory (DDR4, DDR5 etc.) communicates with the NoC through the IO96 subsystem. The NoC transmits data from the data source to the destination through a network composed of switches (routers), interconnecting links (wires), initiators (I), and targets (T). Each NoC provides a horizontal network that connects the logic structures in the programmable logic to the integrated NoC target memory through AXI4 initiators. In addition, each NoC also provides a vertical network that optimizes routing to distribute memory data read from horizontal network paths to the depths of FPGA’s programmable logic structures (programmable logic structures and/or M20K modules).

AMD (Xilinx) deploys the NoC architecture between the AI engine and programmable logic, significantly reducing power consumption. One of the most prominent advantages of AMD Versal products is the ability to combine AI engine arrays with programmable logic (PL) in adaptive engines, allowing for great flexibility in achieving functionality in optimal resources, AI engines, adaptive engines, or scalar engines. This solution can increase chip area computation density by up to 8 times compared to traditional programmable logic DSP and ML implementation solutions, thus reducing power consumption by 40% under rated conditions.

Download link:

“Future Network White Paper (2023) Collection”

1. Future Network White Paper (2023): Computing Network Operating System White Paper

2. Future Network White Paper (2023): Serverless Data Center White Paper Centered on Network IO

3. Future Network White Paper (2023): Optoelectronic Fusion Service Customized Wide Area Network White Paper

Artificial Intelligence Special Report: Intelligent Computing Center – Empowering AI Industrialization, Industrial AI (2023)

China’s Superconductor Industry: Standing at the Technological Frontier, Meeting Energy Strategic Needs (2023)

Industry Report: How Much Computing Power Does Large Model Inference Have?

iResearch Consulting: 2023 China AIGC Industry Panorama Report

Systematic Artificial Intelligence and Large Models (2023)

Diversity Computing: New Generation Computing Architecture Super Heterogeneous Computing

Gathering Strength in “High, Broad and Deep” to Build an Advanced Computing Power Network

Generative AI: Industrial Transformation and Opportunities (2023 Forum Collection)

400+ Heavyweight ChatGPT Professional Reports (Collection)

Statement: Thank you for the hard work of the original author. All articles reprinted by this account will be noted in the text, and if there are any copyright issues, please contact us for processing.

Recommended Reading

For more architecture-related technology knowledge summaries, please refer to “Architects’ Technical All Store Data Package Summary (All) (39 books total)” related e-books have been updated to 39 books, and will continue to be updated.

1. Order “Architects’ Technical All Store Data Package Summary (All)”, including Complete Knowledge of Server Basics (Ultimate Edition)pdf and ppt versions, priced at only 239 yuan (original total price399 yuan).

2. As the number of e-books increases and content updates, prices will increase accordingly, so ordering now is the most cost-effective, and after purchase, you can enjoy free updates for all store content.

Warm Reminder:

Scan the QR code to follow the public account, click Read the original text link to get the details of “Architects’ Technical All Store Data Package Summary (All)” e-book materials..

FPGA Chip Design and Key Technologies

Leave a Comment Cancel reply