Deploy AI Models in Just Three Lines of Code!

The development of artificial intelligence applications is accelerating, and the deployment work that developers face is becoming increasingly complex.The endless array of algorithm models, various architectures of AI hardware, different deployment requirements (servers, services, embedded, mobile, etc.), and different operating systems and programming languages pose significant challenges for AI developers in project implementation.

To solve the challenges of AI deployment, we initiated the FastDeploy project. FastDeploy standardizes the model API for important AI models in industrial deployment scenarios, providing downloadable demo examples that can run immediately. Compared to traditional inference engines, it achieves end-to-end inference performance optimization. FastDeploy also supports online (service deployment) and offline deployment forms, catering to different developers’ deployment needs.

After a year of intensive refinement, FastDeploy currently possesses three distinctive capabilities:

All-Scenario: Supports various hardware including GPU, CPU, Jetson, ARM CPU, Rockchip NPU, Amlogic NPU, and NXP NPU, supporting local deployment, service deployment, web deployment, mobile deployment, etc. It covers three major fields: CV, NLP, and Speech, supporting 16 mainstream algorithm scenarios including image classification, image segmentation, semantic segmentation, object detection, character recognition (OCR), face detection and recognition, portrait cutout, pose estimation, text classification, information extraction, pedestrian tracking, and speech synthesis.

Easy and Flexible: Deploy AI models in just three lines of code, replace models with one API call, and seamlessly switch to other model deployments, providing deployment demos for over 150 popular AI models.

Extreme Efficiency: Unlike traditional deep learning inference engines that only focus on model inference time, FastDeploy emphasizes the end-to-end deployment performance of model tasks. Through high-performance pre- and post-processing, integration of high-performance inference engines, and one-click automatic compression, it achieves extreme performance optimization for AI model inference deployment.

Project Link:

https://github.com/PaddlePaddle/FastDeploy

The following sections will further explain these three major features, with a total of about 2100 words, and an estimated reading time of 3 minutes.

Three Major Features

Three-Step Deployment Practical Guide

CPU/GPU Deployment Practical Guide

Jetson Deployment Practical Guide

RK3588 Deployment Practical Guide (Similar to RV1126, Amlogic A311D, etc.)

Three Major Features Explained

All-Scenario: One Codebase Covers Multi-Platform and Multi-Hardware, Including CV, NLP, and Speech

Supports Paddle Inference, TensorRT, OpenVINO, ONNX Runtime, Paddle Lite, RKNN, etc., covering common AI hardware deployments for various scenarios including NVIDIA GPU, x86 CPU, Jetson Nano, Jetson TX2, ARM CPU (mobile, ARM development boards), Jetson Xavier, Rockchip NPU (RK3588, RK3568, RV1126, RV1109, RK1808), Amlogic NPU (A311D, S905D), etc. It also supports service deployment, offline CPU/GPU deployment, edge and mobile deployment methods. For different hardware, a unified API ensures seamless switching of code across data centers, edge deployments, and endpoint deployments.

FastDeploy supports three major AI fields: CV, NLP, and Speech, covering 16 categories of algorithms (image classification, image segmentation, semantic segmentation, object detection, character recognition (OCR), face detection, facial keypoint detection, face recognition, portrait cutout, video cutout, pose estimation, text classification, information extraction, image-text generation, pedestrian tracking, speech synthesis). It supports mainstream models from six popular AI suites including PaddleClas, PaddleDetection, PaddleSeg, PaddleOCR, PaddleNLP, PaddleSpeech, and also supports the deployment of popular models from ecosystems like PyTorch and ONNX.

Easy and Flexible: Deploy Models in Three Lines of Code, Quickly Experience 150+ Popular Model Deployments with One Command

FastDeploy allows AI models to be deployed on different hardware in just three lines of code, significantly reducing the difficulty and workload of AI model deployment. A single command switches between different inference backends such as TensorRT, OpenVINO, Paddle Inference, Paddle Lite, ONNX Runtime, RKNN, etc. With a low-threshold inference engine backend integration solution, it takes an average of one week to complete the integration of any hardware inference engine, decoupling front-end and back-end architecture design, allowing developers to experience the AI models supported by FastDeploy with simple compilation and testing. Developers can implement corresponding model deployments based on the model API or choose to git clone to quickly obtain deployment demo examples for over 150 popular AI models and experience different model inference deployments rapidly.

# Deploying PP-YOLOE
import fastdeploy as fd
import cv2
model = fd.vision.detection.PPYOLOE("model.pdmodel", 
                                    "model.pdiparams", 
                                    "infer_cfg.yml")
im = cv2.imread("test.jpg")
result = model.predict(im)

# Deploying YOLOv7
import fastdeploy as fd
import cv2
model = fd.vision.detection.YOLOv7("model.onnx")
im = cv2.imread("test.jpg")
result = model.predict(im)

FastDeploy for different model deployments

# Deploying PP-YOLOE
import fastdeploy as fd
import cv2
option = fd.RuntimeOption()
option.use_cpu()
option.use_openvino_backend() # Switch to OpenVINO deployment with one command
model = fd.vision.detection.PPYOLOE("model.pdmodel", 
                                    "model.pdiparams", 
                                    "infer_cfg.yml",
                                    runtime_option=option)
im = cv2.imread("test.jpg")
result = model.predict(im)

FastDeploy Switches Backends and Hardware

Extreme Efficiency: One-Click Compression and Speed-Up, Preprocessing Acceleration, End-to-End Performance Optimization to Enhance AI Algorithm Deployment

While absorbing the advantages of high-performance inference from TensorRT, OpenVINO, Paddle Inference, Paddle Lite, ONNX Runtime, RKNN, etc., FastDeploy addresses the issue that traditional inference engines focus solely on model inference speed by optimizing end-to-end inference, improving overall inference speed and performance. It integrates an automatic compression tool that significantly reduces parameter volume (with almost no loss in accuracy) while greatly enhancing inference speed. By using CUDA to accelerate the optimization of preprocessing and postprocessing modules, the inference speed of YOLO series models has been optimized from 41ms to 25ms. The end-to-end optimization strategy completely resolves performance issues in AI deployment. For more performance optimizations, please follow us on GitHub for details. https://github.com/PaddlePaddle/FastDeploy

Join the FastDeploy Technical Exchange Group

Group Benefits✨

In addition to this update, FastDeploy’s hardware capabilities continue to expand, including Graphcore, Feiteng, ARM CPU (Android, iOS, ARMLinux), Qualcomm, Ascend, Horizon, Kunlun, and Aisino; service deployment, system solutions based on hardware decoding on Jetson (coming soon), and end-to-end high-performance optimizations. These truly address concerns regarding speed, performance, and systems in deployment. Join the group to get the latest product updates.

How to Join the Group✨

Scan the QR code below, follow the official account, fill out the questionnaire, and then enter the WeChat group
Check the group announcement to receive benefits

Deploy AI Models in Just Three Lines of Code!

To help developers further understand FastDeploy’s deployment capabilities and use them in their projects more quickly, we have prepared a live technical exchange event. Scan the code to join the group and follow our live broadcast room!

Live Broadcast Time: November 9 (Wednesday) – November 11 (Friday) at 20:30, everyone is welcome to scan and register.

Deploy AI Models in Just Three Lines of Code!

Three-Step Deployment Practical Guide

CPU/GPU Deployment Practical Guide (Taking YOLOv7 as an example)

Install FastDeploy Deployment Package, Download Deployment Example (Optional, deployment code can also be implemented in three lines of API)

pip install fastdeploy-gpu-python -f https://www.paddlepaddle.org.cn/whl/fastdeploy.html
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd examples/vision/detection/yolov7/python/

Prepare Model Files and Test Images

wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7.onnx
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg

CPU/GPU Inference Model

# CPU Inference
python infer.py --model yolov7.onnx --image 000000014439.jpg --device cpu
# GPU Inference
python infer.py --model yolov7.onnx --image 000000014439.jpg --device gpu
# Use TensorRT inference on GPU
python infer.py --model yolov7.onnx --image 000000014439.jpg --device gpu --use_trt True

Inference Result Example:

Jetson Deployment Practical Guide (Taking YOLOv7 as an example)

Install FastDeploy Deployment Package, Configure Environment Variables

git clone https://github.com/PaddlePaddle/FastDeploy cd FastDeploy
mkdir build && cd build
cmake .. -DBUILD_ON_JETSON=ON -DENABLE_VISION=ON -DCMAKE_INSTALL_PREFIX=${PWD}/install make -j8
make install
cd FastDeploy/build/install
source fastdeploy_init.sh

Prepare Model Files and Test Images

wget https://bj.bcebos.com/paddlehub/fastdeploy/yolov7.onnx
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg

Compile Inference Model

cd examples/vision/detection/yolov7/cpp
cmake .. -DFASTDEPLOY_INSTALL_DIR=${FASTDEPOLY_DIR} 
mkdir build && cd build
make -j

# Use TensorRT inference (if the model does not support TensorRT, it will automatically switch to CPU inference)
./infer_demo yolov7s.onnx 000000014439.jpg 27s.onnx 000000014439.jpg 2

Inference Result Example:

RK3588 Deployment Practical Guide (Taking the lightweight detection network PicoDet as an example)

Install FastDeploy Deployment Package, Download Deployment Example (Optional, deployment code can also be implemented in three lines of API)

# Refer to the compilation documentation to complete FastDeploy compilation and installation
# Documentation link: https://github.com/PaddlePaddle/FastDeploy/blob/develop/docs/cn/build_and_install/rknpu2.md
# Download deployment example code
git clone https://github.com/PaddlePaddle/FastDeploy.git
cd examples/vision/detection/paddledetection/rknpu2/python

Prepare Model Files and Test Images

wget https://bj.bcebos.com/fastdeploy/models/rknn2/picodet_s_416_coco_npu.zip
unzip -qo picodet_s_416_coco_npu.zip
## Download Paddle Static Graph Model and Unzip
wget https://bj.bcebos.com/fastdeploy/models/rknn2/picodet_s_416_coco_npu.zip
unzip -qo picodet_s_416_coco_npu.zip
# Static Graph to ONNX Model, note that the save_file should match the zip file name
paddle2onnx --model_dir picodet_s_416_coco_npu \
            --model_filename model.pdmodel \
            --params_filename model.pdiparams \
            --save_file picodet_s_416_coco_npu/picodet_s_416_coco_npu.onnx \
            --enable_dev_version True

python -m paddle2onnx.optimize --input_model picodet_s_416_coco_npu/picodet_s_416_coco_npu.onnx \
                                --output_model picodet_s_416_coco_npu/picodet_s_416_coco_npu.onnx \
                                --input_shape_dict "{'image':[1,3,416,416]}"
# Convert ONNX Model to RKNN Model
# The converted model will be generated in picodet_s_320_coco_lcnet_non_postprocess directory
python tools/rknpu2/export.py --config_path tools/rknpu2/config/RK3588/picodet_s_416_coco_npu.yaml

# Download Image
wget https://gitee.com/paddlepaddle/PaddleDetection/raw/release/2.4/demo/000000014439.jpg

Inference Model

python3 infer.py --model_file ./picodet _3588/picodet_3588.rknn \
                --config_file ./picodet_3588/deploy.yaml \
                --image images/000000014439.jpg

【More Exciting Live Broadcast Recommendations】

Related posts

Leave a Comment Cancel reply