In FPGA (Field Programmable Gate Array) interviews, questions typically revolve around fundamental principles, design processes, hardware description languages, timing analysis, and practical applications.
1. Basic Concepts and Structure
-
What is the difference between FPGA and CPLD?
- Structure: FPGA is centered around CLB (Configurable Logic Block), containing numerous LUTs (Look-Up Tables) and registers, with abundant routing resources; CPLD is based on a PLA (Programmable Logic Array) structure, with concentrated logic resources but smaller scale.
- Capacity: FPGA is suitable for large-scale designs (tens of thousands to millions of gates), while CPLD is suitable for medium to small scale (thousands to tens of thousands of gates).
- Power loss retention: CPLD typically has built-in non-volatile storage (configuration is retained after power loss); traditional FPGA requires external ROM to load configuration, but some FPGAs (like Xilinx 7 series) now support on-chip non-volatile storage.
- Applications: FPGA is used for complex timing logic (such as communication, image processing), while CPLD is used for simple control logic (such as power management, interface conversion).
What are the basic components of an FPGA? Core components:
- CLB (Configurable Logic Block) consists of LUTs (for combinational logic), flip-flops (for sequential logic), and multiplexers, serving as the basic unit for logical operations.
- Routing resources include global routing, regional routing, and local routing, connecting each CLB and IOB, determining signal transmission delay.
- IOB (Input/Output Block) connects external pins of the chip to internal logic, supporting various voltage standards (such as LVCMOS, LVDS).
- Dedicated hard cores such as PLL (Phase-Locked Loop, for clock division/multiplication), DSP slices (high-speed multipliers/adders), BRAM (Block RAM, for storage), PCIe/ETH and other high-speed interface controllers.
What is the working principle of LUT? How many logic functions can a 4-input LUT implement?
- LUT is essentially a “memory-based look-up table”: it uses input variables as addresses, pre-storing all possible output results, and directly outputs the corresponding value when the input address is provided, achieving combinational logic.
- A 4-input LUT has 2⁴=16 addresses, can store 16 bits of data, and can implement 2¹⁶=65536 types of 4-input logic functions (an n-input LUT can implement 2²ⁿ types of functions).
2. Hardware Description Languages (Verilog/VHDL)
-
What is the difference between blocking assignment (=) and non-blocking assignment (<=) in Verilog? When to use them?
- Blocking assignment: It will “block” subsequent statements during execution, with statements executed in order, commonly used for combinational logic (such as assign, always @(*)).
- Non-blocking assignment: Statements execute in parallel (assignments take effect at the end of the block), commonly used for sequential logic (such as always @(posedge clk)), avoiding race conditions.
- Error example: Using blocking assignment in sequential logic may lead to discrepancies between simulation and actual circuit behavior.
How to implement an asynchronous reset, synchronous release register in Verilog? Asynchronous reset: The reset signal takes effect immediately when valid (not clock-controlled); synchronous release: When the reset signal is removed, it synchronizes with the clock edge to avoid metastability. Core idea: Use two-stage registers to synchronize the reset release signal.
module
sync_rst (
input clk,
input rst_async_n,// Asynchronous reset (active low)
output reg data_out
);
reg rst_sync_n;
// First stage synchronization
always @(posedge clk or negedge rst_async_n) begin
if(!rst_async_n) rst_sync_n <= 1'b0;
else rst_sync_n <= 1'b1;
end
// Second stage synchronization (synchronous release)
always @(posedge clk or negedge rst_async_n) begin
if(!rst_async_n) data_out <= 1'b0;
else if(rst_sync_n) data_out <= 1'b1;// Reset release synchronized with clock
end
endmodule
-
What are the two types of state machines (Moore/Mealy) and their differences? What should be noted during design?
- Clearly define state encoding (binary, Gray code, one-hot code; one-hot code is suitable for FPGA, reducing decoding logic);
- Avoid “state machine deadlock” (all states must be covered, define default branches);
- Sequential logic output (state register + combinational logic output, or directly use state register output to reduce delay).
- Moore type: Output depends only on the current state;
- Mealy type: Output depends on the current state and input;
- Design points:
3. Timing Analysis and Constraints
-
What are the definitions of setup time and hold time? What problems arise if they are violated?
- Setup time: The minimum time (Tsu) that data must be stable before the clock edge arrives;
- Hold time: The minimum time (Th) that data must be stable after the clock edge arrives;
- Consequences of violation: Registers may sample incorrect data (metastability), leading to circuit functionality issues.
How to resolve timing violations (Setup/Hold Violation)?
- Increase data path delay (insert buffers);
- Shorten clock path delay;
- Avoid un-synchronized handling of cross-clock domain paths.
- Reduce clock frequency;
- Optimize combinational logic (split complex logic, insert pipelines);
- Adjust layout and routing (shorten data paths, increase clock path delay);
- Add multi-cycle path constraints for the clock.
- Setup violation (data arrives too late)
- Hold violation (data changes too early)
What is the cross-clock domain (CDC) issue? What are common solutions?
- Single-bit signals: Use a two-stage synchronizer;
- Multi-bit signals: Use asynchronous FIFO (such as Xilinx’s FIFO Generator), handshake protocols;
- High-speed scenarios: Use Gray code encoding (only one bit changes between adjacent states, avoiding multiple bits changing simultaneously).
- Definition: When a signal is transmitted from one clock domain (clk1) to another (clk2, with different frequency/phase), it may lead to metastability due to improper sampling timing.
- Solutions:
4. Design Process and Tools
-
What is the complete process of FPGA design? Typical process:
- RTL Design Write code using Verilog/VHDL (functional implementation);
- Simulation Verification Use ModelSim/VCS for functional simulation (verify logic correctness);
- Synthesis Use tools (such as Vivado Synthesis) to convert RTL to gate-level netlist, mapping to FPGA resources;
- Implementation Includes placement and routing, determining the physical location and connections of logic units;
- Timing Analysis Use Timing Analyzer to check if setup/hold times are met, generating timing reports;
- Bitstream Generation Convert the design into a configuration file (.bit) recognizable by the FPGA;
- Download Verification Download the bitstream to the FPGA via JTAG/USB, board-level debugging (using ILA logic analyzer to capture signals).
-
What are commonly used FPGA development tools?
- Xilinx: Vivado (mainstream), ISE (older version);
- Intel (Altera): Quartus Prime;
- Simulation tools: ModelSim, Questa, Xcelium;
- Debugging tools: ILA (Xilinx), SignalTap (Intel) (online logic analyzers).
5. Advanced and Engineering Practices
-
What is the difference between BRAM and Distributed RAM in FPGA?
- BRAM: Dedicated block RAM resources, large capacity (e.g., one BRAM in Xilinx 7 series is 36Kb), fast speed, does not occupy logic resources, suitable for large capacity storage (such as FIFO, cache);
- Distributed RAM: Composed of LUTs, small capacity (each LUT can act as 16×1 RAM), suitable for small capacity storage (such as small depth FIFO, register files), occupies logic resources.
How to optimize the area and power consumption of FPGA designs?
- Turn off the clock of idle modules (Clock Gating);
- Reduce clock frequency (while meeting performance requirements);
- Set IO levels reasonably (low voltage standards like LVCMOS1.8 consume less power than 3.3);
- Avoid frequent signal toggling (reduce glitches).
- Reuse logic units (e.g., use counters instead of multiple registers);
- Use BRAM/DSP and other hard cores reasonably to reduce CLB usage;
- Use one-hot code for state machines (suitable for FPGA, but larger area, need to balance).
- Area optimization:
- Power consumption optimization:
What is the biggest challenge encountered in projects? How was it resolved? (Answer based on personal experience, example)
- Challenge: In a communication project, occasional errors occurred in cross-clock domain data transmission, and timing analysis showed hold violations.
- Solution: Replaced direct connection with asynchronous FIFO, configured FIFO depth to twice the bandwidth difference between the two clock domains, and used ILA to capture read/write pointers to verify the correctness of synchronization logic, ultimately eliminating errors.
6. Selection and Application Scenarios
-
What is the difference between FPGA and ASIC? When to choose FPGA?
- Difference: FPGA is reprogrammable, has a short development cycle (weeks), high cost (per chip price), suitable for small batches; ASIC is a custom chip, has a long development cycle (months to years), low cost (large batches), better performance/power consumption.
- FPGA applicable scenarios: Prototype verification, small batch products, rapid algorithm iteration (such as 5G base stations, AI acceleration prototypes), devices requiring field upgrades.
What are commonly used FPGA chip models and their application fields?
- Xilinx: 7 series (Artix/Kintex/Virtex, mid-to-high-end industrial control, communication), Zynq (with ARM hard core, suitable for embedded + FPGA mixed design);
- Intel: Cyclone (low cost, consumer electronics), Arria (mid-to-high-end, communication), Stratix (high-end, AI acceleration, supercomputing);
- Domestic: Unisoc Tongchuang (Logos series), Anlu Technology (EG4 series), suitable for industrial control, IoT.
The above questions cover the core points of FPGA interviews, and answers should combine principles and practical project experience, highlighting understanding of engineering issues such as “timing,” “resource optimization,” and “cross-clock domain.”