Introduction
This article details the entire process of building a convolutional neural network license plate recognition system based on SoC, divided into IP and interconnections, data paths, and clock domain partitioning.
The IP design and overall SoC design in the article have corresponding design documents, Verilog code, and FPGA projects, which can not only be tested on the board but also successfully implemented.
Additionally, the article summarizes some design patterns, such as clock domain partitioning for different boards, different projects, and different IPs, such as MiLink, Black Gold, Xilinx official, etc., as well as the project links for this system build, which is a design concept that mid-level and senior digital IC designers must possess.
🌏 I. IP and Interconnections
✅ 1. First, import the ARM Cortex-M3 CPU soft core IP, thus beginning our advanced digital chip design Block Design.
The successful import of the CPU is indicated by the correct recognition of the ARM Cortex-M3 soft core on ARM Keil uVision, as shown below. The bitstream generated by the SoC hardware system equipped with CM3 is loaded onto a pure FPGA platform, and then JTAG is connected to four pins of the FPGA, which are VCC, GND, TCK, and TMS respectively.
inout TMS; // The Input of CM3: SWDITMS.
input TCK; // The Input of CM3: SWCLKTCK.
assign TMS = SWDOEN ? SWDO : 1'bz; // The Output of CM3: SWDO and SWDOEN.
Then, if the AXI bus of Cortex-M3 (CM3_SYS_AXI3) can be correctly exported and connected to the AXI interconnect bus, many slave devices can be mounted (Slave IP). For example, QSPI, UART, GPIO, BRAM, AXI Interconnect, VDMA, etc., the functions of each peripheral IP will be briefly introduced below.
✉ QSPI: This is a communication control interface for QSPI Flash Controller, used to read and write Flash memory on the FPGA board through the QSPI interface.
✉ UART: CM3 can send data to the PC for display via UART or receive data from the PC, enabling data communication with the PC, and reading DDR3 data for testing and verification.
✉ GPIO: As the CPU’s IO peripheral, it allows CM3 to control LED outputs based on input keys, meaning GPIO output pins change with GPIO input pin variations, thus achieving human-computer interaction.
✉ BRAM: Data interaction between CPU and FPGA can be achieved through the BRAM Controller.
✉ AXI Interconnect: AXI4 interconnect bus. VDMA: CM3 can configure VDMA to control DDR3 data reads, writes, and caching.
✅ 2. Next is the design of IIC for OV5640 and HDMI (NOTES: The design of IIC is still challenging, and verification is also a challenge point).
✅ 3. Then, the design of the OV5640 decoding module IP, converting 8 Bits to 16 Bits and then to 24 Bits (NOTES: This IP design is relatively simple).
✅ 4. Adding and configuring commonly used video transmission IPs, such as Video In to AXI4-Stream, VDMA, MIG7, Video Timing Controller, AXI4-Stream to Video Out, etc. (NOTES: You can check the English manual on the Xilinx official website, which has very detailed interface descriptions and examples; it cannot be clearly described in a few words here).
✅ 5. Configuration of clock division IP, partitioning many different clock domains through Clock Wizard.
✅ 6. Cross-compilation process (NOTES: Why is this needed? Because we write C code on ARM Keil uVision, which cannot compile Verilog, so we can only use cross-compilation to compile Verilog and C together. There is a complete set of cross-compilation documentation available through the public account).
The above six points all have detailed codes or internal configurations of the project, which will not be described in detail.
🌏 II. Data Path
✅ 1. First, the RGB565 data from OV5640 is transmitted in, but each clock sends 8 Bits (parallel 8 wires), so a codec module is needed to decode it into 16 Bits (parallel 16 wires) per clock, i.e., RGB565, and finally encode it to RGB888.
✅ 2. RGB888 enters the IP of Video In to AXI4-Stream, converting the video transmission protocol based on VGA line and field synchronization timing protocol (such as hs + vs + data) into the AXI4-Stream flow protocol (such as tvalid + tready + tdata), and then sends it to AXI VDMA for data transport and caching on DDR3.
✅ 2-1. The VDMA interface is detailed as follows:
✉ S_AXI_LITE: Mainly connected to AXI Interconnect, serving as its slave device, receiving status and control, register data configurations, etc., from CM3.
✉ S_AXIS_S2MM: AXI4-Stream from interface (Slave), mainly receiving the data stream from the previous Video In to AXI4-Stream IP, and then transmitting it to DDR3 cache, i.e., Stream to Memory Map.
✉ M_AXIS_MM2S: AXI4-Stream master interface (Master), mainly sending data streams to the following AXI4-Stream to Video Out IP for display, i.e., Memory Map to Stream.
✉ M_AXI_S2MM: AXI4 write channel (write to DDR), connected to AXI Interconnect, then transferring data to DDR3 to write video data through MIG7.
✉ M_AXI_MM2S: AXI4 read channel (read from DDR), connected to AXI Interconnect, then transferring data from DDR3 to read video data through MIG7.
✅ 2-2. The DDR3 MIG7 interface is detailed as follows:
✉ S_AXI: Controls the read and write operations of DDR3 based on the data transfer information received from VDMA, achieving data transport and caching.
✉ aresetn: Global reset signal, active low.
✉ sys_clk_p, sys_clk_: MIG IP system clock input, selectable single-ended or differential clock; here it is 200MHz.
✉ sys_rst: MIG IP system reset.
✉ ui_clk: 100 MHz clock provided to users by MIG IP.
✉ ui_clk_sync_rst: Synchronous reset output of the 100 MHz clock provided to users by MIG IP.
✉ mmcm_locked: Ensures the stability of the output clock.
✉ init_calib_complete: Signal raised after DDR3 initialization is successful (if constrained to an LED, you will find the LED brightens; this signal seems to have little help for system design but is invaluable for system verification, helping us locate whether the problem lies in the DDR3 black box).
✅ 3. AXI4-Stream to Video Out receives video stream data from VDMA and, under the control of the Video Timing Controller based on VGA line and field synchronization video transmission protocol, correctly transmits the video data to LCD or HDMI displays. Here, for HDMI, register configuration is needed, meaning the IIC module design is required.
✅ 4. The data paths of each IP module are as follows:
🌏 III. Clock Domain Partitioning
✅ 1. Summary of clocks for various SoC systems
✅ 2. Clock domain partitioning of various IP modules
✅ 3. Detailed explanation of AXI Interconnect
✉ Global clock ALCK: 100MHz.
✉ Global reset ARESETN: Connected to the Processor System Reset IP’s interconnect_resetn or peripheral_aresetn interface with the ACLK input clock.
✉ Slave device global clock S00_ACLK: 50MHz. Here, since S00_AXI connects to the CPU, the slave device global clock S00_ACLK should equal the CPU frequency; Sxx_ARESETN remains corresponding.
✉ Master device global clock Mxx_ACLK: Similarly, Mxx_ACLK’s clock is consistent with the corresponding Mxx_AXI master device’s global clock aclk, such as QSPI, Uart, GPIO, BRAM Controller as CPU peripherals at 50MHz, corresponding to VDMA, AXI_Mem_Interconnect as the global clock at 100MHz; Mxx_ARESETN remains corresponding.
✅ 4. Detailed explanation of Processor System Reset
✉ slowest_sync_clk: Synchronized clock signal, such as 100MHz and 50MHz.
✉ ext_reset_n: Global reset signal.
✉ xx_resetn: For example, as clock resets of 100MHz and 50MHz, the output end interconnect_resetn or peripheral_aresetn connects to its XX_ARESETN.
✅ 5. Video transmission IP clock domain
Video In and Video Out modules require not only input video data signals but also input video timing signals, necessitating corresponding video clocks. Video In and Video Out’s aclk and video stream timing are synchronized, connected to the same clock source. VTC’s clk and Video Out’s vid_io_out_clk timing are synchronized, connected to the same clock source.
✅ 6. Some Q&A regarding AXI Interconnect IP
✉ Is ACLK the clock used for AXI Crossbar within the IP core? Yes.
✉ Are ARESETN, Sxx_ARESETN, and Mxx_ARESETN all asynchronous resets? They can be; ARESETN needs to be synchronized with the corresponding ACLK clock, and the others correspond to their respective master-slave device clocks. The Processor System Reset IP is usually used to implement asynchronous resets for asynchronous clocks.
✉ Must ACLK be selected from Sxx_ACLK and Mxx_ACLK, or can independent clocks different from Sxx_ACLK and Mxx_ACLK be used? Independent clocks can be used, but FIFO needs to be added in the configuration, or AXI FIFO needs to be added externally to the IP. Generally, this IP is recommended to be connected automatically using Vivado tool’s Auto Connection, and the above Processor System Reset IP is also automatically connected by the tool; manual connection is not recommended.
Leave a Comment
Your email address will not be published. Required fields are marked *