In 1956, in the quiet town of Hanover, New Hampshire, a group of scientific giants gathered at Dartmouth College: John McCarthy (the father of artificial intelligence), Marvin Minsky (an expert in AI and cognitive science), Claude Shannon (the founder of information theory), Allen Newell (a computer scientist), and Herbert Simon (a Nobel laureate in economics). They were discussing a seemingly ethereal topic: using machines to mimic human learning and intelligence in other aspects.
The conference lasted for two months. Although no universal consensus was reached, the discussions gave birth to a name: artificial intelligence (AI). Since then, 1956 has been regarded as the year of AI’s inception. Sixty-five years later, AI can be said to be “everywhere.” Today, from research and finance to retail, industry, and agriculture, an increasing number of sectors and business scenarios are applying AI to enhance efficiency and reduce costs. AI plays an increasingly important role and value in industrial upgrading and improving human life.

Moreover, as enterprises migrate their business deployment scenarios and data generation to the edge and endpoint, embedded AI is also ushering in a period of rapid development. AI embedded systems often have advantages such as real-time response, low network overhead, privacy protection, and high energy efficiency. Therefore, applications of AI embedded systems can be found in areas such as robotics, drones, automobiles, and smartphones. However, it is important to recognize that AI embedded systems currently face challenges such as difficulties in algorithm training, hardware selection, and secondary development and integration.
Against this backdrop, the book “AI Embedded Systems: Algorithm Optimization and Implementation”, co-authored by Associate Professor Ying Rendong and Professor Liu Peilin from the School of Electronic Information and Electrical Engineering at Shanghai Jiao Tong University, was born.
This book targets embedded applications of artificial intelligence, covering multi-dimensional optimization theories and techniques for machine learning algorithms based on computation volume, memory, and power consumption. It elaborates on everything from the basic units of algorithms to the joint optimization of machine learning algorithm training and inference, as well as the automatic optimization deployment algorithms, all validated against general embedded processors. It can be said that this book combines the theory of machine learning algorithm optimization with practical applications, providing foundational theories and methods for achieving efficient AI embedded systems. Therefore, this book undoubtedly serves as a guide for engineering and technical personnel in the embedded field, as well as for software developers of underlying machine learning algorithms and students in related majors such as computer science, electronic information, and automatic control.
Key Point: Follow the Anmou Technology Classroom public account and reply “AI Embedded” to obtain a preview of chapters 1-3 of this book, along with accompanying PPTs, initial version code, and data. Additionally, keep an eye on the Anmou Technology Classroom and Jishu Community for ongoing book giveaway activities. Of course, you can also click on “Read Original” to purchase or learn more about the book’s content and reviews.
The Explosion and Challenges of AI Embedded Systems
In fact, various embedded systems are ubiquitous in our lives today, such as mobile phones, unmanned aircraft control systems, telecom switches, washing machines, smart TVs, automotive control systems, medical CT devices, and more. The widespread application of embedded systems in various business scenarios is mainly due to several characteristics:

First, high reliability; for example, embedded systems in certain devices often need to work continuously for 24 hours, requiring reliability levels of “five nines” or even higher. Second, low-latency response; a typical example is the anti-lock braking system in vehicles, which needs to assess speed and tire status in real time during emergency braking and issue brake control commands within a specified timeframe. Third, low power consumption; for instance, handheld measuring devices like multimeters may need to rely on batteries for several months or even years. Fourth, small size; whether it’s mobile phones or wireless noise-canceling headphones, embedded control systems often need to fit within very limited volumes to meet application requirements.
As artificial intelligence rapidly develops, more and more embedded systems are being enhanced with AI. Compared to traditional “control-type” embedded systems, AI embedded systems are playing increasingly important roles in intelligent perception, interaction, and decision-making.
For example, in intelligent decision-making, the ability to make autonomous decisions is one of the most important features of AI embedded systems. In an autonomous driving system, the embedded system needs to assess the current state and trends based on speed, road obstacles, and traffic sign information, and issue driving “instructions” within a limited time. Moreover, the system must be able to “adapt” especially when encountering unknown states, weighing action benefits against risks to provide the most suitable action output.
Although traditional embedded systems have rapidly developed in recent years and possess features such as efficient real-time performance, they are typically based on fixed, simple logic rules, making them unable to meet the flexibility and adaptability required by complex application scenarios. It is worth noting that, in the autonomous driving field alone, according to IDC, global shipments of smart connected vehicles were about 44.4 million units in 2020, and by 2024, this number is expected to reach approximately 76.2 million units, with a compound annual growth rate (CAGR) of 14.5% from 2020 to 2024. IDC predicts that by 2024, over 71% of new cars shipped globally will be equipped with smart connected systems, highlighting the substantial market demand for AI embedded systems.
However, to realize the large-scale application of embedded systems in increasingly diverse business scenarios, several challenges must be addressed, with “computational load” being the foremost. In the field of machine learning applications, especially in image recognition scenarios, operations involving two-dimensional matrices or higher-dimensional tensors are required, with core algorithms composed of numerous two-dimensional convolutions and matrix multiplications. Some applications also require matrix decomposition, all of which are computationally intensive algorithms. Coupled with the rise of deep learning, the scale of neural networks continues to expand, placing pressure on embedded systems with limited computational power.
Additionally, “storage size” is also a limiting factor, as some machine learning algorithms rely on searching and comparing feature databases, requiring access to massive amounts of data for feature analysis and comparison within a short time. To meet this real-time demand, all accessed data needs to be stored in RAM, which presents challenges for the limited storage capacity of embedded systems. Lastly, there are challenges related to “power consumption”; implementing machine learning algorithms in embedded systems often necessitates satisfying both computational load and real-time requirements. While increasing CPU clock speeds and computational hardware resources can meet these demands, it comes at the cost of increased operational power consumption, thereby limiting the use of many machine learning algorithms in battery or solar-powered applications.
Thus, while the application scenarios for AI embedded systems are promising, the challenges they face are unprecedented. How can we better address these challenges?
The Value of Algorithm Optimization and Implementation
The book “AI Embedded Systems: Algorithm Optimization and Implementation” attempts to resolve the aforementioned issues through “algorithm optimization,” enhancing the AI computational capabilities of embedded systems through algorithm improvement and software optimization. The advantage of this approach lies in its ability to leverage the characteristics of existing processor hardware, allowing users to avoid the need for new specialized hardware. Although many algorithms discussed in this book are implemented based on general embedded processors, they can also be applied to other processor systems and modified into forms that serve as computational acceleration engines across various processors.
Moreover, to implement machine learning inference algorithms in resource-constrained embedded systems, this book provides optimization explanations at various levels, including:

First, system solution optimization, which involves considering which solution to use for specific machine learning problems. For instance, whether to implement visual image classification through support vector machines, deep neural networks, or random forests; different machine learning algorithms have their pros and cons in terms of computational load, memory, classification accuracy, and training difficulty, which this book explains in detail.
Second, optimization of machine learning inference model structures, which refers to simplifying the computational structure of a given machine learning algorithm. This can be achieved through methods such as approximate algorithms, model pruning, and feature dimensionality reduction to decrease computational complexity. Additionally, for a given machine learning algorithm computation graph, redundant intermediate data calculations can be eliminated through equivalent transformations of computational modules, such as merging parameters in convolution and batch normalization layers within neural networks.
Third, operator optimization refers to optimizing the underlying computational modules of machine learning algorithms to reduce computational complexity. Specific solutions include using approximate algorithms to reduce computational load, such as reducing matrix multiplication through low-rank approximations, and employing fast algorithms based on transform domains to lower computation complexity, such as converting convolution operations into pointwise multiplication.
Fourth, bitwise operation optimization refers to lower-level optimizations based on binary representation of data, such as converting constant multiplication into addition and subtraction, or performing approximate calculations for floating-point multiplication by adjusting their exponent and mantissa.
Fifth, optimization based on processor hardware features refers to optimizing for specific CPU hardware used in embedded systems, including using SIMD instructions available on specific CPUs for parallel data vector operations and utilizing high-width registers to simultaneously compute multiple low-width data in parallel. This optimization method is often closely related to hardware characteristics.
From the various algorithm optimization introductions above, it is evident that this book possesses strong professionalism. The authors have divided the book into eight chapters, comprehensively and systematically introducing topics including embedded software programming modes and optimization, an overview of machine learning algorithms, convolution operation optimization, matrix multiplication optimization, implementation and optimization of neural networks, and machine learning programming on ARM platforms.
For instance, in the chapter on “Machine Learning Programming on ARM Platforms,” due to the widespread application of ARM processors in embedded systems, this book specifically introduces machine learning algorithm programming based on ARM embedded platforms. The content mainly revolves around three software frameworks within ARM: the CMSIS framework, the Computer Library, and the ARM NN framework. By discussing several machine learning examples, readers can quickly grasp the core ideas of machine learning algorithm programming on ARM platforms in embedded systems, and through reviewing the source code and documentation of these frameworks, gradually master more application techniques.
To facilitate more readers in “reproducing” the various optimization algorithms introduced in the book, the authors also provide complete code on the Huazhang Library (www.hzbook.com). Readers can search for the book on this website to download relevant resources, allowing more readers to reference and modify the code provided in the book for application in actual embedded systems. Thus, this book also possesses strong guidance and practicality.
In summary, if you are a student or developer in a field related to embedded technology and are optimistic about the future development potential of AI embedded systems in the industry, and you want to learn more about the principles, design methods, and implementation techniques of machine learning algorithm optimization in embedded systems to enrich your knowledge structure, I strongly recommend that you take a look at this significant work!
to purchase or learn more about the book’s content and reviews!

Shen Yao’s Technology Observation, founded by Shen Siki, a cross-media person in technology and automotive, has 18 years of experience in enterprise-level technology media work, focusing on observations and reflections on corporate digitalization, industrial intelligence, ICT infrastructure, and automotive technology content.