The power of FPGA (Field Programmable Gate Array) lies in its integration of various programmable resources, enabling the flexible implementation of a wide range of digital circuits from simple logic to complex systems. Understanding the functions of these resources and their collaborative logic is fundamental to efficient FPGA design.
01
—
Core Internal Resources of FPGA
The internal resources of FPGA are designed around four core needs: “logic operations, data storage, signal processing, and external interaction.” Different resources have clear divisions of labor, collectively supporting the realization of complex functions.
1. Core Logic Resources
-
LUT (Look-Up Table): The core unit for implementing combinational logic in FPGA. Essentially a small RAM, it stores the correspondence between inputs and outputs (truth table) to output computation results in real-time. For example, a 2-input AND gate’s LUT would pre-store the correspondence “00→0, 01→0, 10→0, 11→1,” allowing the input signals to directly look up the results. It can implement any combinational logic, such as logic gates, data selectors, and arithmetic operations, serving as the “calculator” for logical functions.
-
FF (Flip-Flop): The foundation of sequential logic. As a 1-bit storage unit, it requires a clock signal to trigger data updates and can lock 1-bit binary information. Commonly used in scenarios like data buffering, registers, and counters, for instance, locking the computation results of LUT on the clock edge to avoid signal jitter caused by combinational logic delays, acting as the “voltage regulator” for timing synchronization.
2. Storage Resources
-
BRAM (Block RAM): An independent, high-capacity, high-speed storage module. The capacity of a single block is typically 18Kb or 36Kb, supporting dual-port parallel read and write, with read/write speeds matching the system clock (up to several hundred MHz). Suitable for storing bulk data, such as image pixels, sensor sampling values, or implementing FIFO (First In First Out queues), ROM (storing fixed tables like sine wave data), serving as the “main warehouse” for large-capacity data.
-
Distributed RAM: Small-capacity storage formed by stitching together LUTs. The capacity typically ranges from dozens to hundreds of bytes, essentially reusing the storage function of multiple LUTs, suitable for temporarily caching small amounts of data (like temporary variables in address decoders), acting as a “mini warehouse.”
3. Computational Acceleration and Auxiliary Resources
-
DSP (Digital Signal Processing Unit): A specially optimized multiply-accumulate (MAC) module that can efficiently perform complex operations like multiplication and accumulation, significantly faster than operations implemented with LUTs, commonly used in signal processing scenarios like filters and FFT.
-
Clock Management Resources (PLL/MMCM): Generate stable clocks through frequency division, multiplication, and phase adjustment, ensuring that resources like LUT, FF, and BRAM operate in synchronized timing, avoiding data errors caused by unstable clocks.
-
I/O Resources: Including I/O Bank and SerDes. The I/O Bank is responsible for level matching and driving external signals, connecting to external chips (like CPUs, sensors); SerDes implements high-speed serial communication (over 10Gbps), used for high-speed interfaces like PCIe and SATA.
02
—
Collaboration Principle of LUT, FF, and BRAM
LUT, FF, and BRAM form the “iron triangle” of FPGA functionality. The essence of their collaboration is through the complementary functions of “combinational logic operations (LUT) + sequential synchronous storage (FF) + large-capacity data caching (BRAM),” forming a complete data processing link that addresses the core issues of “how to compute data, how to store it, and how to stabilize it.”
1. Core Positioning of the Three: Complementary Functions
-
LUT: Responsible for “computation”—executing combinational logic operations in real-time, with no storage capability, output results changing in real-time with inputs.
-
FF: Responsible for “stability”—latching data through clock synchronization, eliminating delay jitter from LUT operations, ensuring timing stability, while temporarily storing 1-bit immediate data.
-
BRAM: Responsible for “storage”—large-capacity storage of bulk data, avoiding resource waste by using numerous FFs (1 BRAM can be equivalent to thousands of FFs) for data storage, while supporting high-speed parallel read and write.
2. Typical Collaborative Scenarios
Scenario 1: Real-time Signal Filtering (Data Processing Pipeline)
Taking audio signal filtering as an example, it requires processing high-speed continuous sampling data, with the three working together to form a “buffer – computation – synchronization” pipeline:
-
BRAM buffers data: The audio signal collected by ADC (e.g., 48kHz sampling rate) is first written into BRAM to avoid data loss due to insufficient LUT computation speed, while decoupling “input speed” from “computation speed.”
-
LUT executes filtering computation: Reads data from BRAM at a fixed pace, with LUT implementing the core logic of the filtering algorithm (e.g., addition operation for moving average).
-
FF synchronizes the pipeline: Placing one FF at both the input and output ends of the LUT, the computation is divided into three stages: “read data → compute → write result,” with each stage temporarily storing intermediate results using FF. The original computation speed of 100MHz can be increased to 300MHz through pipelining, significantly improving processing efficiency.
Scenario 2: Implementation of UART Communication Protocol (State Machine + Data Storage)
The UART protocol requires controlling data transmission and reception through a state machine while storing complete data packets, with clear divisions of labor among the three:
-
LUT generates state logic: Based on the current state (e.g., “waiting for start bit” “receiving data bit”) and input signals (UART_RX), LUT calculates the next state (e.g., switching to “receiving data bit” after detecting the start bit).
-
FF stores state and immediate data: FF latches the “next state” output from LUT, achieving state sequence switching; at the same time, it temporarily stores the received 1-bit data (e.g., the 3rd bit), preventing loss during computation.
-
BRAM stores complete data packets: Once the 8 data bits + 1 parity bit are received, the complete data packet is stored in BRAM, avoiding the use of 8 FFs to store 1 byte of data (wasting resources), allowing subsequent data retrieval from BRAM for CPU processing.
Scenario 3: Image Grayscale Conversion (Parallel Data Processing)
When converting RGB images to grayscale, it requires parallel processing of a large number of pixels, with the three enhancing efficiency through parallel capabilities:
-
BRAM dual-port parallel read and write: The A port of BRAM reads the original RGB pixels (e.g., R=0x12, G=0x34, B=0x56), while the B port simultaneously writes the converted grayscale value (e.g., 0x30), achieving “read – compute – write” in parallel.
-
LUT array parallel computation: Multiple LUTs form parallel units, simultaneously executing the computation “grayscale = 0.299R + 0.587G + 0.114B” for different pixels (achieved through pre-stored coefficients for fast computation).
-
FF synchronizes parallel data: Each LUT unit is paired with an FF to synchronize the input RGB data and output grayscale values, ensuring that the processing clock of all pixels is consistent, avoiding image misalignment.
3. Advantages of Collaboration
-
Resource Optimization: BRAM replaces a large number of FFs for data storage (1 block of 36Kb BRAM ≈ 4500 FFs), saving scarce FF resources; LUT focuses on logic operations, avoiding speed impacts from BRAM read/write delays.
-
Performance Improvement: The synchronization role of FF eliminates the cumulative delays of LUT combinational logic, while the high-speed read/write of BRAM avoids data bottlenecks, allowing the three to support high-frequency designs of several hundred MHz.
-
High Flexibility: From simple counters to complex embedded CPUs, the collaboration of the three can cover the vast majority of digital circuit needs, reflecting the core advantage of FPGA’s “programmability.”
4. Considerations for Collaborative Design
-
Resource Allocation: Large-capacity data (≥1KB) should prioritize using BRAM, small-capacity immediate data (like registers) should use FF, and pure logic operations should use LUT.
-
Timing Constraints: Ensure that the three operate in the same clock domain, with separate constraints on the BRAM read/write paths to avoid cross-clock domain errors or timing violations.
-
Avoid Waste: Do not use BRAM to store dozens of bytes of data (leading to resource fragmentation), and do not use FF to store large-capacity data (leading to resource exhaustion).
Previous Articles:
Introduction to Common FPGA Development Platforms
How to Use Pointers Reasonably in Embedded Development? What Issues Can Pointer Operations Cause?
What is FPGA? What Application Scenarios Does FPGA Have in Embedded Systems? How Does It Differ from MCU?
What is Edge Computing? What Applications Does Edge Computing Have in Embedded Systems?
Introduction to Qt for MCUs Tools
Explanation of Several Terms Related to Brushless DC Motors (BLDC): Pole Count, Electrical Angle, Electrical Frequency, Phase Voltage, Line Voltage, Back EMF
What is Bare-Metal Development? How Does It Differ from RTOS-Based Development?
Usage of Semaphores in FreeRTOS
Compilation of Code Standards for Embedded Software
Sharing an Open Source Automation Code Generation Tool – XRobot