Getting Started with Hailo: Accelerating Edge AI on Toradex Modules

Edge Computer Vision

Why Choose Edge Computing?

Embedded devices are becoming increasingly intelligent, with many machine learning and computer vision tasks being pushed to edge devices. Running AI models on such devices, while challenging, offers numerous advantages:

  1. Reduced Latency: Processing data on-device eliminates the wait time for transmitting data to the cloud or a central processor.
  2. Enhanced Privacy Protection: Sensitive data remains on the device, ensuring compliance with strict privacy regulations.
  3. Cost Savings on Bandwidth: Edge processing reduces the need to send large amounts of data to centralized servers.
  4. Increased Reliability: Systems can operate independently without a network connection.

Why External AI Accelerators are Needed?

Toradex offers a variety of System on Modules (SoMs), some of which integrate Neural Processing Units (NPUs) capable of handling different AI workloads. For example, the Verdin iMX8M Plus, Verdin iMX95, and Aquila AM69 are equipped with NPUs designed specifically for accelerating edge inference, making them suitable for numerous computer vision and machine learning applications.

While these modules provide robust AI solutions, external AI accelerators such as Hailo-8, EdgeX, MemryX, and Google Coral address challenges by offering modular, decoupled, and scalable edge AI inference solutions. This brings greater flexibility and future-proof AI capabilities.

1. Decoupling AI Processing from SoC Vendor Software One major challenge in running machine learning at the edge is adapting models to specific hardware or runtime libraries. Whether it’s the NXP eiQ platform, TI Edge AI Studio, or ONNX export tools, each has its own AI toolkit and optimization strategies. External AI accelerators separate AI workloads from other hardware, providing a unified runtime environment across multiple hardware platforms.

Example: A computer vision solution developed on an x86 device using the Hailo-8 AI accelerator can be seamlessly migrated to an Aquila AM69 module equipped with Hailo-8 without needing to reconstruct the entire AI stack. This decoupling ensures that migration can be completed with minimal adjustments, significantly shortening time-to-market.

2. Modularity and Scalability AI applications have dynamic characteristics, and performance requirements may change as workload complexity increases or new features are created. While built-in NPUs can provide solid solutions, they may sometimes struggle to adapt to new scenarios.

Introduction to Hailo

Hailo is an AI processor manufacturer whose products are designed to run advanced machine learning applications at the edge, applicable across various industries and fields such as smart cities, automotive, manufacturing, agriculture, and retail.

We tested the Hailo-8 M.2 module on several Toradex modules. The Hailo-8 M.2 module is an AI accelerator module with 26 TOPS of computing power and a PCIe Gen-3.0 4-lane M-key interface. This M.2 module can be inserted into various Toradex carrier boards for real-time deep neural network inference.

How Hailo Fully Utilizes the Toradex Ecosystem?

Offloading Preprocessing and Postprocessing Tasks

Getting Started with Hailo: Accelerating Edge AI on Toradex Modules

Source: https://hailo.ai/blog/customer-case-study-developing-a-high-performance-application-on-an-embedded-edge-ai-device/

A typical computer vision workflow follows a linear pattern. Starting from the camera capturing the source, until the application takes action, the image must go through every processing step. This means that if any one step takes longer than the next, that is the bottleneck.

Typically, when comparing machine learning models or hardware, we focus heavily on inference speed, but that is only part of the problem.

Complete Software Stack

Hailo is a complete AI solution that supports most steps in common machine learning workflows.

  1. Performance Evaluation
    1. TAPPAS is a code repository containing application examples.
    2. Model Zoo not only provides benchmark results for some models but also includes pretrained models.
  2. Model Training
    1. Some pretrained models come with a retraining environment.
  3. Compiler and Runtime Libraries
    1. Hailo Dataflow Compiler
    2. pyHailoRT and GStreamer plugins

From Toradex’s perspective, this workflow can be complemented by using the Torizon cloud platform.

  1. Performance Monitoring
    1. Identify any issues in advance to ensure system reliability.
  2. OTA Updates
    1. Easily update production devices.

Support for Toradex Module Hardware

Hardware

Supported Hardware Configurations

Series Module Carrier Board Hailo
Aquila TI AM69(1+2 x PCIe 3.0) Clover(M.2 key B+M) Hailo-8Hailo-8L
Aquila NXP i.MX 95(1 x PCIe 3.0) Clover(M.2 key B+M) Hailo-8Hailo-8L
Verdin NXP i.MX 95(1 x PCIe 3.0) Mallow(M.2 key B) Hailo-8Hailo-8L
Verdin NXP i.MX 8M Plus(1 x PCIe 3.0) Mallow(M.2 key B) Hailo-8Hailo-8L
Verdin NXP i.MX 8M Mini(1 x PCIe 2.0) Mallow(M.2 key B) Hailo-8Hailo-8L
Apalis NXP i.MX8(2 x PCIe 3.0) Ixora(Mini PCIe) Hailo-8R mPCIe

Software

OS Version Other Resources
Torizon OS BSP 7 meta-hailo layer (coming soon)
Torizon OS BSP 6 runtime container (coming soon)
Torizon OS Minimal BSP 6 meta-hailo kirkstoneOpenEmbedded layer for GStreamer 1.0
tdx-reference-multimedia BSP 6 meta-hailo kirkstone

YOLOv5 Example

In this example, we will run a demo application from Tappas: After completing this example, you should get output similar to the following, running at 60+ FPS (depending on your camera).

Getting Started with Hailo: Accelerating Edge AI on Toradex Modules

We will use:

Camera

If using a USB camera, the frame rate may be very low due to the camera’s capture speed.

Display

Verdin i.MX8MP + Mallow Carrier Board

Verdin iMX8M Plus QuadLite 1GB IT (0065) is not compatible with Framos cameras.Hailo AI Accelerator

Steps:

  1. Build Torizon OS from source
    1. Build the base Torizon OS
    2. Add dependencies
  2. Hardware Setup
    1. Connect the Hailo device
    2. Connect the camera
    3. Install the new image
  3. Check all configurations
  4. Run the example

Build Torizon OS from Source

Build the Base Torizon OS Image We will use the CROPS container to build the following image:

Torizon OS Distro Machine Torizon OS Image Target Version
torizon verdin-imx8mp torizon-minimal 6.8.0

Create a working directory

cd ~ mkdir ~/yocto-workdir

Run the container (this will build the base image)

This will consume a lot of memory and take several hours to complete.

The second line of the command maps the host volume to the container’s workdir directory. Note that this folder ~/yocto-workdir was created in the previous step.

docker run --rm -it --name=crops \  -v ~/yocto-workdir:/workdir \  --workdir=/workdir \  -e MACHINE="verdin-imx8mp" \  -e IMAGE="torizon-minimal" \  -e DISTRO="torizon" \  -e BRANCH="refs/tags/6.8.0" \  -e MANIFEST="torizoncore/default.xml" \  -e ACCEPT_FSL_EULA="1" \  -e BB_NUMBER_THREADS="2" \  -e PARALLEL_MAKE="-j 2" \ # not sure if I can pass those like this  torizon/crops:kirkstone-6.x.y startup-tdx.sh

Add Dependencies to the Image

To add dependencies, first navigate to the ~/yocto-workdir/layers folder.

cd ./layers

We will add the following layers:

  • meta-hailo
  • meta-gstreamer1.0
  • meta-toradex-framos

In the torizon/crops:kirkstone-6.x.y container, run the bitbake add layers command.

bitbake-layers add-layer meta-hailo/meta-hailo-accelerator bitbake-layers add-layer meta-hailo/meta-hailo-libhailort bitbake-layers add-layer meta-hailo/meta-hailo-tappas bitbake-layers add-layer meta-hailo/meta-hailo-vpu bitbake-layers add-layer meta-toradex-framos bitbake-layers add-layer meta-gstreamer1.0

In the build-torizon/conf/local.conf file, add packages. Append the following content at the end.

IMAGE_INSTALL:append = " libhailort hailortcli pyhailort libgsthailo hailo-pci hailo-firmware" IMAGE_INSTALL:append = " gstreamer1.0 gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-bad" IMAGE_INSTALL:append = " v4l-utils"

Compile the image with the new layers.

bitbake torizon-minimal

You can find the installation image compatible with the Toradex Easy Installer at ~/yocto-workdir/build-torizon/deploy/images/verdin-imx8mp/torizon-minimal-verdin-imx8mp-Tezi_6.8.0-devel-<date>+build.0.tar.

Hardware Setup

Connect the Hailo Device

Insert the Hailo device into the M.2 slot of the Mallow carrier board.

Getting Started with Hailo: Accelerating Edge AI on Toradex Modules

Connect the Camera

Connect the camera to the MIPI-CSI interface on the Mallow carrier board.

Getting Started with Hailo: Accelerating Edge AI on Toradex Modules

Install the New Torizon OS Image

Use the Toradex Easy Installer (Tezi) to flash the new image onto the device.

  1. Download Tezi
  2. Put the device into recovery mode
  3. Install the newly compiled image

Getting Started with Hailo: Accelerating Edge AI on Toradex Modules

Check Installation Status

Hailo Device

sudo su hailocli scan hailocli device-info

The output of these commands should detect that the device is properly connected and the drivers are functioning correctly.

Display

gst-launch-1.0 videotestsrc ! videoconvert ! autovideosink

You should see some colorful patterns on the screen.

Camera Device

This step may vary depending on the camera used.

v4l2-ctl -d2 -D
v4l2-ctl --list-formats-ext -d /dev/video2

For Framos cameras, the output is as follows.

root@verdin-imx8mp-15445736:~# v4l2-ctl --list-formats-ext -d /dev/video2 ioctl: VIDIOC_ENUM_FMT Type: Video Capture [0]: 'YUYV' (YUYV 4:2:2) Size: Stepwise 176x144 - 4096x3072 with step 16/8 [1]: 'NV12' (Y/CbCr 4:2:0) Size: Stepwise 176x144 - 4096x3072 with step 16/8 [2]: 'NV16' (Y/CbCr 4:2:2) Size: Stepwise 176x144 - 4096x3072 with step 16/8 [3]: 'RG12' (12-bit Bayer RGRG/GBGB) Size: Stepwise 176x144 - 4096x3072 with step 16/8

In the demo, we will use the YUYV format. So keep those values in mind.

gst-launch-1.0 -v v4l2src device=/dev/video2 ! video/x-raw ! videoconvert ! autovideosink

Run the Example

Some cameras specify resolution and frame rate, so these values may need to be adjusted accordingly. This can be done by modifying the framerate value in the PIPELINE variable.

sudo su cd ~/apps/detection/ ./detection.sh

CompletionGetting Started with Hailo: Accelerating Edge AI on Toradex Modules

Next Steps: Pairing the Device to Torizon Cloud

In future blog posts, we will cover the following topics:

  • Monitoring the device using device-related metrics from Torizon Cloud.
  • Retraining models using Hailo environment containers.
  • Using Torizon remote updates to change the running model version.

Why Choose Toradex?

Toradex has over 21 years of excellence in the embedded industry, providing a rich combination of computer modules (SoMs) and carrier boards to help businesses build scalable, high-performance embedded applications.

Quality and Reliability

Toradex hardware is designed for durability, ensuring stable operation even in harsh industrial environments. Using high-quality components and undergoing rigorous testing, it minimizes downtime for critical applications.

Software Ecosystem

  • • Torizon OS – A user-friendly industrial Linux distribution based on Yocto.
  • • Torizon Cloud – Secure OTA updates, device monitoring, and remote access features.
  • • Torizon IDE – Development, debugging, and deployment through VS Code plugins.

Product Lifecycle:

Toradex commits to a product supply period of over 10 years, ensuring stability. Products will continue to receive support and remain available for an extended period.

Developer Resources

Simplifying development means accelerating deployment. Toradex provides a wealth of developer resources.

  • • Comprehensive documentation.
  • • Free support channels from the community and Toradex experts.
  • • Development tools such as TCB, Tezi, and Torizon containers.

Leave a Comment