It has been over 10 years since I first encountered FPGA during my university days. I still remember the excitement of completing experiments like a digital stopwatch, quiz buzzer, and password lock on the EDA experimental platform. At that time, I had not yet been exposed to HDL hardware description languages, and designs were implemented using 74 series logic devices in the MAX+plus II schematic environment. Later, during my graduate studies and in my work, I gradually used Quartus II, FoundaTIon, ISE, Libero, and learned the Verilog HDL language. Through this learning process, I slowly appreciated the magic of Verilog, as a small piece of code could accomplish complex schematic designs, and its portability and operability are much stronger than schematic design.
Before learning a new technology, we often start with its programming language. For example, when learning microcontrollers, we usually begin with assembly or C language. Thus, many developers who start working with FPGAs often begin their studies with VHDL or Verilog. However, I personally believe that if one can first systematically study various 74 series logic circuits in conjunction with “Fundamentals of Digital Circuits,” it will greatly benefit the understanding of HDL languages, often leading to more effective learning.
Of course, learning any programming language is not an overnight task; the accumulation of experience and skills occurs gradually, and FPGA design is no exception. Below, I will share my personal experiences and tips regarding FPGA design.
Let’s first discuss the basic knowledge of FPGA:
1. Basic Principles of Hardware Design
FPGA (Field-Programmable Gate Array) is a product developed further based on programmable devices such as PAL, GAL, and CPLD. It emerged as a semi-custom circuit in the field of application-specific integrated circuits, solving the shortcomings of custom circuits while overcoming the limitations of the number of gate circuits in existing programmable devices.
Speed and Area Trade-off Principle:
If a design has a large timing margin and can run at a frequency much higher than the design requirements, it can reduce the overall chip area consumed by reusing modules, thus trading speed advantages for area savings;
Conversely, if a design has high timing requirements that cannot be met by conventional methods, one can handle the design using “ping-pong operations” and “serial-parallel conversion” by converting data streams and duplicating multiple operational modules, then performing “parallel-serial conversion” at the chip output module. This achieves an increase in speed by replicating area.
Hardware Principles: Understand the essence of HDL.
System Principles: Grasp the overall picture.
Synchronization Design Principles: The basic principle of stable timing design.
2. Verilog as an HDL Language
The modeling of system behavior is hierarchical.
Important levels include system level, algorithm level, register transfer level, logic level, gate level, and circuit switch level.
3. In actual work, besides using for loops in simulation test stimulus, for loops are rarely used in RTL-level coding.
This is because for loops will be expanded by the synthesizer into execution statements for all variable cases, with each variable independently occupying register resources, which cannot effectively reuse hardware logic resources, leading to significant waste. Generally, the case statement is used instead.
4. There is a significant difference between if…else… and case in nested descriptions.
if…else… has priorities; generally, the first if has the highest priority, and the last else has the lowest priority. The case statement is a parallel statement without priorities, and establishing a priority structure consumes a lot of logic resources, so where case can be used, if…else… should not be.
Supplement: 1. You can also use if…; if…; if…; to describe parallel statements without priorities.
5. FPGA generally has abundant flip-flop resources, while CPLD has richer combinational logic resources.
6. Composition of FPGA and CPLD
An FPGA generally consists of programmable I/O units, basic programmable logic units, embedded block RAM, abundant routing resources, lower-level embedded functional units, and dedicated hard cores.
CPLD has a relatively simple structure, mainly consisting of programmable I/O units, basic logic units, routing pools, and other auxiliary functional modules.
There are three types of block RAM structures: M512 RAM (512 bits), M4K RAM (4 Kbits), and M-RAM (64 Kbits).
M512 RAM: Suitable for small buffers, FIFOs, DPRAMs, SPRAMs, ROMs, etc.;
M4K RAM: Suitable for general needs;
M-RAM: Suitable for large data buffers.
Xilinx and Lattice FPGA’s LUT can be flexibly configured into small RAM, ROM, FIFO, and other storage structures. This technology is called distributed RAM.
Supplement: However, in general designs, it is not recommended to configure a large amount of memory with FPGA/CPLD on-chip resources, due to cost considerations. Therefore, external memory should be used as much as possible.
8. Make good use of the internal PLL or DLL resources of the chip to complete operations such as clock division, multiplication, and phase shifting.
This not only simplifies the design but also effectively improves the accuracy and stability of the system.
9. The difference between asynchronous circuits and synchronous sequential circuits.
The core logic of the circuit is implemented using combinational circuits;
The biggest drawback of asynchronous sequential circuits is that they are prone to glitches;
They are not conducive to device portability;
They are not conducive to static timing analysis (STA) and verifying design timing performance.
Synchronous Sequential Circuits:
The core logic of the circuit is implemented using various flip-flops;
Major signals and output signals are driven by a clock edge triggering the flip-flops;
Synchronous sequential circuits can effectively avoid glitches;
They are conducive to device portability;
They are conducive to static timing analysis (STA) and verifying design timing performance.
10. In synchronous design, stable and reliable data sampling must adhere to the following two basic principles:
(1) Before the valid clock edge arrives, the data input must be stable for at least the setup time of the sampling register. This principle is referred to as meeting the setup time principle;
(2) After the valid clock edge arrives, the data input must remain stable for at least the hold time of the sampling register. This principle is referred to as meeting the hold time principle.
11. Considerations for synchronous sequential design
Data conversion across asynchronous clock domains.
Design methods for combinational logic circuits.
Clock design for synchronous sequential circuits.
The delay of synchronous sequential circuits. The most common design method for delay in synchronous sequential circuits is to use clock division or multiplication or synchronous counters to achieve the required delay. For larger and special timing requirements, a high-speed clock is generally used to generate a counter to produce a delay. For smaller delays, a D flip-flop can be used to introduce a one-clock cycle delay, thus synchronizing the signal with the clock for the first time. This is used in input signal sampling and increasing timing constraint margins.
Additionally, behavioral-level methods can be used to describe delays, such as “#5 a <= 4’0101;” which is commonly used in simulation test stimuli but will be ignored during circuit synthesis and does not achieve delay.
The reg type defined in Verilog does not necessarily synthesize into a register. In Verilog code, the two most commonly used data types are wire and reg. Generally, the wire type is implemented through combinational logic, while the reg type does not necessarily imply the use of a register.
12. Common Design Ideas and Techniques
(1) Ping-pong operations;
(2) Serial-parallel conversion;
(4) Data synchronization across asynchronous clock domains. This refers to the problem of reliably exchanging data between two clock domains that are not synchronized. There are mainly two situations with unsynchronized data clock domains:
① Two domains have the same clock frequency but differ by an unpredictable amount or a fixed but unmeasurable amount, referred to as the same frequency but different phase problem.
② Two clock frequencies are completely different, referred to as the different frequency problem.
Two not recommended methods for asynchronous clock domain operations: one is to adjust sampling by adding buffers or other gates; the other is to blindly use clock edges to adjust data sampling.
13. Basic Principles of Module Division
(1) Use registers for the outputs of each submodule in synchronous sequential design (the principle of dividing synchronous sequential modules with registers).
(2) Group related logic and reusable logic within the same module (echoing the system principle).
(3) Separate logic with different optimization objectives.
(4) Group logic that requires constraints into the same module.
(5) Independently divide storage logic into modules.
(6) Appropriate module size.
(7) The top-level module should ideally not perform logic design.
14. Considerations for Combinational Logic
(1) Avoid combinational logic feedback loops (which can lead to glitches, oscillations, timing violations, etc.).
Solution: A. Remember that any feedback loop must include registers; B. Check the synthesis and implementation reports for warning information, and modify accordingly if feedback loops (combinational loops) are found.
(2) Replace delay chains.
Solution: Use clock multiplication, division, or synchronous counters to complete.
(3) Replace asynchronous pulse generation units (glitch generators).
Solution: Use synchronous sequential design pulse circuits.
(4) Use latches cautiously.
A. Use complete if…else statements; 4
B. Check if the design contains combinational logic feedback loops;*
C. Design output operations for each input condition and set default operations for case statements. Especially in state machine designs, it is best to have a default state transition, and each state should also have a default operation.
D. When using case statements, especially in state machine designs, it is advisable to attach synthesis constraints to synthesize into complete conditional case statements.
Tip: Carefully check the synthesizer’s synthesis report; currently, most synthesizers warn about synthesized latches, making it easier to identify inadvertently generated latches through the synthesis report.
15. Considerations for Clock Design
The recommended clock design method for synchronous sequential circuits: input the clock through global clock input pins, adjust and compute through FPGA’s internal dedicated PLL or DLL for division/multiplication, phase shifting, etc., and then drive the clock input of all registers and other modules in the chip through the FPGA’s internal global clock routing resources.
The five basic skills of FPGA designers are simulation, synthesis, timing analysis, debugging, and verification.
For FPGA designers, mastering these five basic skills is part of the same process as using the corresponding EDA tools, with the following correspondence:
1. Simulation: Modelsim, Quartus II (Simulator Tool)
2. Synthesis: Quartus II (Compiler Tool, RTL Viewer, Technology Map Viewer, Chip Planner)
3. Timing: Quartus II (TimeQuest Timing Analyzer, Technology Map Viewer, Chip Planner)
4. Debugging: Quartus II (SignalTap II Logic Analyzer, Virtual JTAG, Assignment Editor)
5. Verification: Modelsim, Quartus II (Test Bench Template Writer)
Mastering HDL language is not everything in FPGA design, but its influence runs through the entire FPGA design process, complementing the five basic skills of FPGA design.
For FPGA designers, mastering the “synthesizable subset of HDL language” can accomplish 50% of FPGA design work—design coding.
Practicing simulation, synthesis, and timing analysis, the three basic skills, helps in learning the “synthesizable subset of HDL language” in the following ways:
Through simulation, one can observe the logical behavior of HDL language in FPGA.
Through synthesis, one can observe the physical implementation of HDL language in FPGA.
Through timing analysis, one can analyze the physical implementation characteristics of HDL language in FPGA.
For FPGA designers, effectively using the “verification subset of HDL language” can accomplish the remaining 50% of FPGA design work—debugging and verification.
1. Build a verification environment; simulation can be used to verify the correctness of FPGA design.
2. Comprehensive simulation verification can reduce the workload of FPGA hardware debugging.
3. Combining hardware debugging with simulation verification methods can solve unverified problems during simulation and ensure that resolved issues do not reoccur during debugging, establishing a regression verification process that aids in maintaining FPGA design projects.
The five basic skills of FPGA designers are not isolated; they must be used in conjunction to complete a complete FPGA design process. Conversely, completing a complete design process is the most effective way to practice these five basic skills. Once a preliminary understanding of these five basic skills is achieved, one can delve deeper into each and then apply the knowledge in a complete design process. Through this iterative process, one can gradually improve their design level. By adopting a step-by-step, spiral approach, as long as one has received training to get started, they can self-learn and practice for self-improvement.
Books on FPGA design available on the market tend to introduce each aspect of FPGA design separately to ensure structural integrity. Although each aspect is covered in depth, readers find it difficult to put into practice due to the lack of support from other relevant aspects. Only by reading the entire book can one gain a holistic understanding of FPGA design. Such books are not suitable as engineering training manuals but can serve as advanced reference books for specific areas.
For new employees, they often have a preliminary understanding of the overall FPGA design process, and certain aspects of the five basic skills may be solid. However, due to deficiencies in one or several areas, they are limited in their ability to independently complete the entire design process.
The purpose of onboarding training is to help them master the overall design process, cultivate the ability to acquire information independently, and form a positive cycle of self-promotion and self-development through repeated training in several design processes. In this process, as their understanding of the breadth and depth of knowledge related to their work becomes clearer, their self-confidence will gradually increase, and their personal development direction will become clearer, enabling them to actively participate in engineering projects.
Finally, let me summarize a few points:
1) Read code, build models
Only by establishing logical models in one’s mind and understanding the basis of FPGA’s internal logic structure can one comprehend why writing Verilog is different from writing C, leading to an understanding of the differences in design methods between sequential execution languages and parallel execution languages. When seeing a simple program, one should think about what kind of functional circuit it represents.
2) Simplify design logic using mathematical thinking
Learning FPGA requires not only logical thinking but also good mathematical thinking to simplify designs. Therefore, those who struggle with advanced mathematics should pay attention to this subject. For example, when multiplying two 32-bit numbers X[31:0] and Y[31:0], while both Altera and Xilinx have ready-to-use multiplier IP cores, which is the simplest method, two 32-bit multipliers will consume a lot of resources. So, is there a way to save resources that is not too complex? We can make a small modification:
Split X[31:0] into two parts: X1[15:0] and X2[15:0], where X1[15:0]=X[31:16] and X2[15:0]=X[15:0]. Then, adding X1 shifted left by 16 bits to X2 gives X; similarly, split Y[31:0] into Y1[15:0] and Y2[15:0], where Y1[15:0]=Y[31:16] and Y2[15:0]=Y[15:0]. The multiplication of X and Y can then be transformed into multiplying X1 and X2 with Y1 and Y2 separately, thus converting a 32-bit * 32-bit multiplication operation into four 16-bit * 16-bit multiplications and three 32-bit additions. The resource usage will significantly decrease after this transformation. Interested readers can synthesize and compare the two methods to see the difference.
3) The relationship between clock and flip-flops
“The clock is the controller of sequential circuits” is a classic statement and can be considered a golden rule in FPGA design. FPGA design mainly focuses on sequential circuits because no matter how complex combinational logic circuits are, they do not offer much variety and are not too difficult to understand. However, sequential circuits are different; all their actions change rhythmically with each clock pulse. The clock can be seen as the controller of the entire circuit; if not controlled properly, the circuit’s functionality will become chaotic.
To illustrate, the clock is akin to the heart of the human body, with each beat triggering a CLK, supplying blood to various organs and maintaining normal bodily functions. Each organ’s proper functioning relies on the composition of organizational cells, which can be likened to the basic unit of organizational cells, the flip-flop. The clock in sequential logic circuits is the “engine” that controls the state transitions of sequential logic circuits; without it, sequential logic circuits cannot operate normally, as the state changes of flip-flops require the rising or falling edge of the clock! This highlights the core role of the clock in sequential circuits!
In conclusion, my experience boils down to practicing more, thinking more, and asking more questions. Practical experience yields true knowledge; seeing someone else’s solution 100 times is no substitute for practicing it oneself. The motivation for practice comes from both interest and pressure, with the latter being more important in my opinion. Having a demand can easily create pressure, meaning it is best to train in actual project development rather than learning for the sake of learning.
During practice, it is essential to think critically about the reasons behind problems, and after resolving issues, to ask several “why” questions. This is also a process of accumulating experience. If one has a habit of writing project logs, it is even better to document problems, causes, and solutions. Finally, do not hesitate to ask questions; if pondering a problem does not lead to a solution, seek help. After all, individual capabilities are limited, and asking classmates, colleagues, search engines, or online communities can provide insights that help solve problems quickly.
Welcome to join the angel round,Around enterprises (Friendly connections include 500 automotive investment institutions, including top-tier organizations; several enterprises have been completed);There are communication groups for leaders of sci-tech innovation companies, covering dozens of groups in the automotive industry, including complete vehicles, automotive semiconductors, key components, new energy vehicles, intelligent connected vehicles, aftermarket, automotive investment, autonomous driving, and vehicle networking. Please scan the administrator’s WeChat to join the group (Please indicate your company name)
