Introduction: This issue of the AI Embedded Briefing for 20201120 brings you 8 news items, hoping to be of help to you~

Today’s push is packed with valuable content, tailored for those who follow embedded AI~

1. Zhang Xianyi, CEO of Pengfeng Technology, explains how to use AI development boards for embedded vision application development aimed at drones.

Exploring Embedded Vision Applications with AI Development Boards

This article is a summary of the speech by Zhang Xianyi, CEO of Pengfeng Technology at the “Drone Vision Innovation Forum” hosted by Zhixdongxi. The theme of Zhang Xianyi’s speech was “Using AI Development Boards for Embedded Vision Application Development for Drones”.

In this speech, Zhang Xianyi first introduced the hardware selection for AI development boards, followed by a detailed analysis of the optimization of embedded AI software performance and related algorithms.

This article is a compilation of the speech’s visuals and text.

The theme of the speech was “Using AI Development Boards for Embedded Vision Application Development for Drones”, and the content is mainly divided into the following 3 parts:

1. Hardware selection for AI development boards

2. Optimization of embedded AI software performance

3. Embedded AI algorithm models

2. TensorFlow releases a new version for Mac, with GPU support for training, speed increased by up to 7 times

Exploring Embedded Vision Applications with AI Development Boards

Apple’s call to action seems to be no exception in the field of machine learning. The new Mac was launched less than two weeks ago, and Google has already prepared a version of TensorFlow optimized for Mac, with training speed increased to 7 times the original. For developers, engineers, and researchers, Mac has always been a very popular platform, and some people use Mac to train neural networks, but the training speed has always been a headache.

Last week, Apple released three new Macs equipped with the Arm architecture M1 chip, prompting questions about whether training neural networks on them would be faster.

Today, the mainstream machine learning framework TensorFlow announced that it has created a version of TensorFlow 2.4 optimized for Mac users, which can be used on both M1 Macs and Intel Macs. This move is expected to significantly lower the barriers to model training and deployment.

3. Practical! Deploying PyTorch models on C++ platforms + Pitfall Records

Original article link: https://zhuanlan.zhihu.com/p/146453159

Recently, due to work requirements, I needed to deploy a PyTorch model on a C++ platform, following the official teaching examples, during which I encountered several pitfalls, this is a record post.

This article mainly explains how to deploy a PyTorch model on a C++ platform, divided into four main sections:

Model conversion
Saving serialized models
Loading serialized PyTorch models in C++
Executing Script Module.

4. 420 FPS! LSTR: Lane Detection Network Based on Transformer

Paper: https://arxiv.org/abs/2011.04233

Code will be open-sourced soon!

https://github.com/liuruijin17/LSTR

Performance surpasses networks like PolyLaneNet, with speeds up to 420 FPS!

Lane detection is the process of recognizing lanes as approximate curves, widely used in lane departure warnings and adaptive cruise control for autonomous vehicles. The popular pipeline solves the problem in two steps: feature extraction and post-processing. While useful, it is inefficient and has flaws in learning global context and long, thin structures of channels.

This paper proposes an end-to-end method that directly outputs parameters of lane shape models, using a network built with transformers to learn richer structures and context. The lane shape model is formulated based on road structure and camera posture, providing physical explanations for the parameters output by the network. The transformer uses a self-attention mechanism to model non-local interactions to capture elongated structures and global context.

This method has been validated in the TuSimple benchmark and demonstrates state-of-the-art accuracy with the lightest model size and fastest speed.

Moreover, our method shows excellent adaptability to challenging self-collected lane detection datasets, demonstrating its strong deployment potential in practical applications.

5. The official improved version of YOLOv4 is here! 55.8% AP! Speed up to 1774 FPS, Scaled-YOLOv4 officially open-sourced!

Paper:

https://arxiv.org/2011.08036

GitHub: https://github.com/WongKinYiu/ScaledYOLOv4

This article is a continuation of the YOLO series by the original team of YOLOv4 (including CSPNet first author and YOLOv4 first author AB God), proposing two YOLO models suitable for low-end and high-end GPUs based on several different factors affecting model scaling.

The paper proposes a “Network Scaling” method that adjusts not only depth, width, and resolution but also the network results, which the authors call Scaled-YOLOv4.

The resulting YOLOv4-Large achieves SOTA results: 55.4% AP (73.3% AP50) on the MS-COCO dataset, with an inference speed of 15fps@Tesla V100; after adding TTA, the model reaches 55.8% AP (73.2% AP50). Currently, YOLOv4-Large has achieved the best metrics on the COCO dataset among all public papers. The resulting YOLOv4-tiny achieves 22.0% AP (42.0% AP50) with an inference speed of 443fps@TRX 2080Ti; with TensorRT acceleration and FP16 inference, its inference speed can reach 1774fps at batchsize=4.

The main contributions of this paper include the following:

Designed a powerful “network scaling” method to enhance the performance of small models while balancing computational complexity and memory usage;
Designed a simple and effective strategy for scaling large object detectors;
Analyzed the correlations between model scaling factors and conducted model scaling based on optimal partitioning;
Confirmed through experiments: FPN structure is inherently a once-for-all structure
Based on the above analysis, designed two efficient models: YOLOv4-tiny and YOLOv4-Large.

6. Portrait Matting is no longer sufficient for researchers; this study specializes in animal matting!

Paper: https://arxiv.org/pdf/2010.16188v1.pdf

GitHub: https://github.com/JizhiziLi/animal-matting

In an era where images and videos are becoming mainstream media, everyone has become accustomed to “matting” and perhaps has even seen a few TV series filmed through “matting”. However, compared to human portrait matting, the unique appearances and furry characteristics of animals seem to pose greater challenges.

So, is there a matting technology specifically for animals? IEEE Fellow Jizhizi Li, Tao Dacheng, and others have developed an end-to-end matting technique specifically for animal matting.

The appearance and fur characteristics of animals pose challenges to existing methods, which typically require additional user input (such as trimaps).

To address these issues, Tao Dacheng and others studied semantics and matting details, decomposing the task into two parallel sub-tasks: high-level semantic segmentation and low-level detail matting. Specifically, this study proposes a novel method—Glance and Focus Matting network (GFM)—using a shared encoder and two separate decoders to collaboratively learn the two sub-tasks, achieving end-to-end animal image matting.

7. Intel releases its first structured ASIC for 5G, AI, cloud, and edge applications

Original article link:

http://www.eepw.com.cn/article/202011/420476.htm

On November 18, 2020, at the Intel FPGA Technology Conference, Intel launched the new customizable solution Intel® eASIC N5X, designed to accelerate the application performance of 5G, AI, cloud, and edge workloads. This customizable solution features an Intel® FPGA-compatible hardware processor system and is the first structured eASIC product line. Intel® eASIC N5X helps customers migrate custom logic and designs to structured ASICs through embedded hardware processors in FPGAs, offering benefits such as lower unit costs, faster performance, and lower power consumption.

Intel® eASIC N5X devices, as innovative new products, can reduce core energy consumption and costs by up to 50% compared to FPGAs, while speeding up time to market and lowering non-recurring engineering costs compared to ASICs. Users can create power-optimized, high-performance, highly differentiated solutions.

8. M1808 AI Core Board Equipped with 5G Module, Supporting 5G Layout in Industrial Fields

Original article link:

http://www.eepw.com.cn/article/202011/419966.htm

Although discussions about 5G currently focus more on smartphones, in fact, in the era of extensive 5G applications, smartphones are just a small part of it; more applications will focus on industrial internet, internet of things, vehicle networking, etc., subtly permeating people’s lives.

In early 2020, ZLG launched its first artificial intelligence AI core board M1808. This core board adopts a high-end dual-core architecture, integrates a neural network processor (NPU), and includes professional AI algorithms, providing users with a systematic solution of “hardware + software + algorithms”. The M1808 also features rich peripheral interfaces for application expansion. Video supports MIPI/CIF/BT1120 input and MIPI/RGB display output; it has a series of sensor input/output interfaces such as PWM/I2C/SPI/UART; and high-speed device interfaces such as USB3.0/USB2.0/PCIE, thus supporting the Neoway N510M 5G module driver simultaneously.

The M1808 AI core board, paired with the 5G internet module Neoway N510M, offers excellent RF performance, supporting 5G, 4G, and 3G, with wide frequency coverage, supporting both SA and NSA networking methods, and covering Sub-6GHz. It integrates various network protocols and provides industry-standard interfaces to maximize ultra-high-speed data transmission applications in eMBB scenarios, making it an excellent choice for fields such as power IoT, security monitoring, smart energy, industrial control, and smart transportation.

You can add WeChat 17775982065 as a friend, indicating: company + name, to join the RT-Thread official WeChat group!

RT-Thread

Making the development of IoT terminals simple and fast, maximizing the value of chips. Apache 2.0 protocol, can be used freely in commercial products without the need to disclose source code, with no potential commercial risks.

Long press the QR code to follow us

Click to read the original text and enter the official website

Every “look” you click is taken seriously as a like