Understanding Embedded Artificial Intelligence

Introduction:Embedded artificial intelligence is a fascinating concept.Previously, I often heard students preparing for jobs or graduate school ask which has a better future: embedded systems or artificial intelligence?It turns out that adults want a more promising future, leading to the emergence of embedded artificial intelligence.This article will briefly discuss embedded artificial intelligence.

Author: Mu Yang

Source: Huazhang Computer (hzbook_jsj)

Understanding Embedded Artificial Intelligence

Embedded artificial intelligence refers to performing artificial intelligence within an embedded environment. The algorithm models remain the same, but given the changes in the era, some aspects must change. First, let’s address why we need embedded artificial intelligence.

01 Why Embedded Artificial Intelligence Is Needed

What is the biggest problem in the development of artificial intelligence today? Passing the Turing test? Achieving SOTA? I believe it is the issue of practical application. In recent years, there have been numerous big news stories about artificial intelligence, and whether in academia or industry, everyone feels that AI is at a tipping point. However, the voices of skepticism are growing stronger, with people questioning whether this is a bubble and even predicting when it might burst.

Why is opinion so polarized regarding artificial intelligence?

Initially, I didn’t understand and asked some friends and looked at some materials. Gradually, I realized: it ultimately comes down to the issue of practical application. There are several top conferences related to artificial intelligence every year, and each time, some noteworthy developments emerge, sometimes with unexpected breakthroughs that make us exclaim, “Oh, it can be done like this too!” To be fair, from a research perspective, artificial intelligence has made significant progress in recent years, and everyone is looking forward to the singularity arriving sooner.

However, considering the application perspective, things are not so optimistic. Many AI products indeed amazed us, but that was several years ago. It’s not that nothing has been done in recent years; mainly, improvements have been made. A few years ago, Siri struggled to respond, but now it can handle arguments without issues. But what new features have emerged that we didn’t have before? After thinking for a long time, I couldn’t come up with any. It’s a paradox: on one hand, rapid progress; on the other, silence—this is the problem.

Many people say that the next application scenario for artificial intelligence will be the next tipping point. I certainly can’t predict what that scenario will be, but I know one of the options is embedded artificial intelligence. Previously, when we talked about embedded systems, we often thought of microcontrollers, leading many students to believe these two terms are the same.

Historically, the general process of embedded systems was as follows: input sensor signals, process them with a written program, then output control signals. This is why learning embedded systems often feels like learning microcontroller programming.

However, we are now in the intelligent era, and embedded systems have evolved into “intelligent embedded systems.” What is the difference?

Here, I quote a viewpoint from “AI Embedded Systems”:Traditional embedded systems are mainly used for control, while intelligent embedded systems enhance the levels of perception, interaction, and decision-making, referred to as intelligent perception, intelligent interaction, and intelligent decision-making.The book clearly explains these three terms, and I would like to use this point to discuss why embedded artificial intelligence may be one of the next practical application scenarios.

Understanding Embedded Artificial Intelligence

02 The Role of Intelligence

What everyone wants to know is what difference does adding “intelligence” make, and the key lies in programming.

In the past, to accomplish anything with embedded systems, programming was essential. Programming isn’t just about writing code; it first requires data analysis, extracting requirements, and finally implementing programming. Each of these steps takes time, especially as the deployment environments for embedded systems become increasingly complex, and the issues multiply. Using traditional methods, all these issues must be considered individually during programming, and rules must be explicitly written for the system to have processing capabilities.

Manually writing rules is time-consuming and labor-intensive, and it is easy to “miss many while catching one.” Is there a better way to solve this? Yes, by incorporating artificial intelligence. AI does not require manual rule writing; instead, it can “self-learn” through data. For some abstract and difficult-to-describe rules, AI can also acquire corresponding capabilities through learning. This is intelligence.

With embedded systems plus artificial intelligence, labor costs are significantly reduced, and the scope for imagination is greatly extended, potentially nurturing the next killer application of artificial intelligence.

03 The Special Limitations of Embedded Systems

Now, we have a new question:Why carve out a separate area within artificial intelligence called embedded artificial intelligence?

As mentioned earlier, the model algorithms for embedded artificial intelligence are the same as before; there isn’t much theoretical difference. The problem lies in the embedded systems themselves.

Embedded devices and general-purpose computers have vastly different computational environments, with the most noticeable difference being the limited storage capacity. “AI Embedded Systems” provides a comparison: common deep neural network models using single-precision floating-point numbers for storage have parameter storage requirements between 20MB and 560MB. In contrast, traditional low-cost embedded systems have a maximum RAM storage capacity of only 16MB.

Another limitation is computational power. This is self-evident; embedded devices have always been at a disadvantage in terms of computing power. Another often overlooked but equally significant limitation is energy consumption. Deep learning models are known for their high energy consumption; every time a model is trained, the graphics card’s fan sounds as if it is about to take off. Even just getting the model to run is a power-intensive task, whereas embedded devices often have energy consumption limits.

In summary, while it may be easy to find an embedded application scenario that requires artificial intelligence, and there are indeed existing models that meet the requirements, the various limitations of embedded systems mean that simply applying existing AI results can lead to disappointment.

Understanding Embedded Artificial Intelligence

04 Solutions for Artificial Intelligence

There are methods available, and two paths can be taken. One is to enhance the hardware conditions of embedded devices, but this also limits application scenarios. The other path is for artificial intelligence to optimize itself, reducing the demands for space, computing power, and energy consumption. Currently, the latter option is being pursued, bringing the problem back to the realm of artificial intelligence, leading to the development of embedded artificial intelligence.

This section contains a lot of content, so I will provide a brief introduction to help everyone understand what aspects of embedded artificial intelligence are being researched. For artificial intelligence to adapt to embedded environments, it must literally “cut its coat according to its cloth,” significantly reducing resource requirements while ensuring performance does not decline significantly. This process is called optimization in embedded artificial intelligence, such as matrix multiplication optimization.

I previously mentioned that matrix operations are the backbone of machine learning. Machine learning is not a single model but a collection of models that heavily rely on matrix operations, particularly matrix multiplication.If the computational load of matrix multiplication can be reduced, the overall computational load of the model can also be reduced.

The concept is straightforward, but does such a thing really exist?

Indeed, scientists have developed such methods. This begins with the Strassen matrix multiplication algorithm, proposed in 1969, which was the first algorithm with a complexity lower than O(N^3) for matrix multiplication.

In simple terms, this algorithm can reduce the computational load of matrix multiplication. For instance, multiplying 2×2 matrices with the conventional algorithm requires 8 multiplications, while using this algorithm only requires 7, resulting in a 12.5% reduction in computational load. However, this comes at the cost of increasing the number of addition operations from 4 to 18.

At first glance, this might not seem significant, but once the matrices scale up, such as with 128×128 matrix operations, using this algorithm can reduce the number of multiplications by about half, making the optimization effect quite remarkable. Similar algorithms, like the Winograd algorithm, can also reduce multiplication operations by 50% in large-scale matrix calculations.

These types of algorithms are referred to as fast matrix multiplication. Another approach involves sacrificing precision for speed, allowing for some margin of error in the results of matrix multiplication. These algorithms are called approximate matrix multiplication, such as approximate matrix multiplication based on statistical correlations and fast multiplication based on low-rank decomposition of data covariance.

Similar optimizations can be applied to convolution operations. The importance of convolution operations is well-known; deep neural networks for image processing are heavily reliant on them. The optimization directions are similar to matrix multiplication, focusing on both fast convolution algorithms and approximate convolution algorithms, such as one-dimensional cyclic convolution frequency domain fast algorithms and two-dimensional fast convolutions based on low-rank decomposition of convolution kernels.

Understanding Embedded Artificial Intelligence

05 How to Develop Embedded Artificial Intelligence

Since practical application is essential, in addition to theoretical research, development tools are also necessary. However, there are many embedded platforms, and the development environments vary. Here, I will introduce some machine learning tools for the Arm platform based on “AI Embedded Systems.”

Arm needs no introduction. Ten years ago, learning embedded systems might have been a new term, but now every smartphone contains an Arm processor. However, according to processor classification, the Arm platform can be roughly divided into three main series: Cortex-A, Cortex-M, and Cortex-R.

The Cortex-A series is what is found in our smartphones, and it is said that its computing power exceeds that of the supercomputers used by NASA during the moon landing. The hardware conditions are relatively generous, and development resources are abundant. Both major deep learning frameworks, Pytorch and Tensorflow, have released versions for Cortex-A. Arm has also introduced libraries like ACL (Arm Compute Library) to support high-performance data operations on the Arm platform.

We hear less about the Cortex-M and Cortex-R series because they are primarily aimed at industrial applications. The Cortex-M series focuses on low-cost, low-energy consumption application scenarios, while the Cortex-R series is designed for tasks requiring higher real-time performance. Compared to Cortex-A, these two series have very limited computing power. Arm has specifically released the CMSIS software architecture to facilitate application development for the Cortex-M series.

CMSIS has two libraries closely related to machine learning, one is the CMSIS-DSP library, which provides basic mathematical operations such as matrix operations, some machine learning algorithms like SVM, as well as functions for FIR filtering, KL distance calculation, DCT transformation, PID control, and sorting. CMSIS-DSP is written in C but can import models trained using Python’s Scikit-Learn library.

Another is the CMSIS-NN library, as the name suggests, is a library for building neural networks and can also import trained models.

About the author: Mo Fan, known as Mu Yang. An entertaining machine learning explainer, author of “Mathematical Analysis and Python Implementation of Machine Learning Algorithms,” a follower of cutting-edge technology trends, adept at simplifying complex technologies, operates the WeChat public account “Machine Learning Before Sleep” and has a personal Zhihu account “Mu Yang.”

Understanding Embedded Artificial Intelligence

Further Reading👇

Further reading: “AI Embedded Systems: Algorithm Optimization and Implementation”

Recommendation: Recommended by the ARM China Education Program. This book explains the theory, design methods, and implementation of machine learning algorithm optimization for artificial intelligence embedded applications. It serves as a reference book and textbook for AI embedded algorithm design and development that combines theory and practice. A must-have reference for efficiently embedding intelligence into embedded systems.

Direct Access to Valuable Content👇

In the age of artificial intelligence, three questions will determine the future.
Finally, someone has clarified the normal distribution and the 80/20 principle.
Vivo’s real case: What problems did the middle platform solve?
These 10 functional modules will teach you how to design an e-commerce system from scratch.

More Exciting Content👇

In the WeChat public account dialog box, enter the followingkeywords

to see more high-quality content!

AI | Artificial Intelligence | Machine Learning | Deep Learning | NLP

According to statistics, 99% of experts follow this public account.

👇

Related posts

Leave a Comment Cancel reply