Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Dec.

Follow us by clicking the blue text

2024.12

Tip: Learn how to deploy Tiny YOLO v4 on MYIR’s ZU3EG FPGA development board, comparing the performance of FPGA, GPU, and CPU to assist AIoT edge computing applications.

1. Why choose FPGA: Addressing 7nm process and AI limitations

In the context of global semiconductor process restrictions and high-end GPU limitations, FPGA has become one of the important paths for the development of Chinese enterprises. It supports flexible AIoT applications, and its flexibility and programmability allow it to achieve efficient hardware acceleration under domestically mature 28nm processes or even lower node processes.

MYIR’s ZU3EG development board provides support for AI and compute-intensive tasks with its reconfigurable architecture, while avoiding the constraints of 7nm processes on domestic chip design. By deploying Tiny YOLO v4 on ZU3EG, we can provide efficient solutions for AIoT applications such as smart homes and smart cities.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications
Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

2. Understanding the Tiny YOLO model and its applicability

YOLO (You Only Look Once) is a real-time object detection model that achieves efficient object recognition by scanning the entire image at once.

Its simplified version, Tiny YOLO v4, is more suitable for embedded devices, having fewer layers and parameters. Its lightweight characteristics make it ideal for running on resource-constrained devices, especially performing excellently in low-power, real-time detection edge computing devices.

Compared to traditional GPUs, FPGAs can achieve similar inference performance with a small area and low power consumption, making it very suitable for AIoT applications. FPGA development boards like MYIR ZU3EG, with their baseboard and rich interface design, are very suitable for efficient embedded low-power data processing.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Yolo V4 network structure diagram

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Tiny Yolo V4 network structure diagram (by optimizing network structure and parameters, maintaining high detection accuracy while reducing model computation and memory usage)

Scan the code and reply FPGA to join the industry community for discussions!

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

3. Obtaining datasets and models

You can download open-source training datasets or pre-trained models. To ensure compatibility, it is recommended to convert the model to ONNX format for subsequent optimization on FPGA.

1. Download the Tiny YOLO v4 model: Get the pre-trained weights for Tiny YOLO from the Darknet GitHub repository, or train the model yourself on datasets like COCO. Custom models are suitable for specific application scenarios (such as vehicle detection, face detection, etc.).

2. Data preparation: If you want to customize the model, you can use tools like LabelImg to annotate the dataset and convert the data to YOLO format. After that, the YOLO format can be converted to ONNX format for compatibility with FPGA optimization toolchains.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Tiny YOLO training screenshot on Darknet

4. Preparing the model for FPGA using Vivado HLS

To deploy the model on FPGA, the neural network operations need to be converted into hardware-level descriptions. Using Xilinx’s Vitis HLS (High-Level Synthesis) can convert the C++ model code of Tiny YOLO v4 into Verilog RTL (Register Transfer Level) code, bringing the model from the software world into hardware implementation.

Detailed steps:

1. Model layer mapping and optimization:

Map each layer of YOLO (such as convolutional layers, pooling layers) to hardware-friendly C/C++ structures. For example, map convolution to a multiply-accumulate (MAC) array and achieve parallelization through pipelining.

2. Operator acceleration and instruction optimization:

Pipelining: Use pipelining to process multiple operations in parallel, reducing latency.

Loop Unrolling: Unroll loops to process more data per cycle, especially effective in convolution operations.

Set DATAFLOW directives to allow independent processing between layers.

3. Quantization and bit-width adjustment:

Quantize activation values and weights to fixed-point precision (e.g., INT8) instead of floating-point numbers. This significantly reduces computation while maintaining accuracy, especially suitable for FPGA’s fixed-point operation support.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Flowchart of Tiny YOLO model transformation in Vivado HLS

5. Using Vivado to synthesize and deploy Verilog to MYIR’s ZU3EG FPGA development board

Once the RTL code generated by HLS is ready, you can use Vivado to deploy the model to the FPGA.

1. Settings in Vivado:

Import the RTL files output by HLS into Vivado.

Create a module design in Vivado, including connecting the AXI interface to the ARM core of ZU3EG.

2. I/O constraints and timing:

Define I/O pin constraints for the FPGA to match the specific pin configuration of the ZU3EG board. Configure clock constraints to meet appropriate data rates (e.g., video data 100-200 MHz).

Perform timing analysis to ensure latency and response speeds meet real-time requirements.

3. Generate bitstream and download to ZU3EG:

The generated bitstream can be downloaded directly to ZU3EG via JTAG or Ethernet interface.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Connecting the Tiny YOLO processing module to peripherals and interfaces of MYIR ZU3EG development board

6. Testing and running inference on FPGA

Now that Tiny YOLO is deployed, we can verify its real-time object detection performance.

1. Data collection:

Capture images or video frames using the connected camera module, or use stored test videos.

Use OpenCV on the ARM core of ZU3EG to preprocess the frames before passing them to the FPGA for inference.

2. Post-processing and display:

After the model detects objects, output bounding boxes and class labels. Use OpenCV to map the bounding boxes back to the original frames and display the class and confidence around each detected object.

3. Performance testing:

Measure frame rates (FPS) and detection accuracy. Fine-tune quantization bit-width or data flow parameters to optimize real-time demands.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Real-time output showing detection results of the Tiny YOLO model on ZU3EG, with detected objects labeled in the video frame

7. Performance optimization and debugging tips

To improve performance, the following adjustments can be made:

Memory access: Design data storage methods to maximize cache utilization and reduce data transfer, lowering memory bottlenecks.

Reduce latency: Reassess critical path latency. If latency is too high, adjust the pipeline depth in Vitis HLS and verify inter-layer data dependencies.

Quantization improvements: Try INT8 quantization. Xilinx’s Vitis AI can assist in fine-tuning quantization parameters to balance accuracy and speed.

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

The impact of different optimization configurations on resource usage

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Figure: MYIR MYC-CZU3EG/4EV/5EV-V2 core board and development board

The MYIR ZU3EG development platform provides an efficient solution. Utilizing the unique flexibility and low power advantages of FPGA, it assists in the popularization and intelligent upgrading of future AIoT devices.

Scan the code and reply FPGA to join the industry community for discussions!

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications
Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Tap “Looking”↘ to share your world with friends

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications Please click “Read the original text” for more information

Innovations in the Post-Moore Era: Implementing Tiny YOLO V4 on MYIR FPGA to Empower AIoT Applications

Leave a Comment

×