Common Hardware for Edge AI: SoCs and Deep Learning Accelerators

This article, in conjunction with the book “AI at the Edge“, introduces two additional types of hardware that can be used in edge AI devices: System-on-Chip (SoC) and Deep Learning Accelerators.

SoC

Following microcontrollers (MCU), the next most common type of edge computing hardware is System-on-Chip (SoC) devices. MCUs are streamlined, optimized versions of computers that eliminate all redundant designs, while SoC devices attempt to compress all the functionalities of a traditional computer system into a single chip.

MCU software interacts directly with hardware, whereas SoC devices run traditional operating systems that abstract a significant amount of hardware, allowing developers to focus entirely on application code. Developers can use the same tools and environments as those used for writing server and desktop applications, including high-level languages like Python, while modern MCUs are typically programmed in C or C++.

The ease of use of SoC leads to two issues: efficiency and complexity. The energy efficiency of SoC is often much lower than that of MCUs, limiting their application areas. Nevertheless, SoC still outperforms traditional computer systems with independent peripherals by an order of magnitude in efficiency, but it is far less effective in reducing power consumption compared to MCUs. This additional energy usage also brings thermal management issues.

The added complexity brought by the operating system is another burden for SoC devices. With a large amount of operating system code running alongside the developer’s application, it becomes more challenging to ensure the reliability of the operation.

Functionally, SoC devices are often much more powerful than microcontrollers. Here are some typical statistics:

64-bit architecture
>1 GHz clock speed
Multiple processors
External RAM and flash memory (usually several GB)
2D or 3D graphics processing units
Wireless networking
High-performance digital input and output
Current consumption: ~5 volts at several hundred milliamps
Cost: tens of dollars per unit

Common SoC devices can process audio and high-resolution video for deep learning in near real-time.

Although their operational efficiency is far inferior to that of MCUs, SoC devices are already revolutionary. They deploy powerful general-purpose computing in extremely small form factors. In the modern world, SoC devices are ubiquitous, powering our smartphones, televisions, automotive entertainment systems, industrial hardware, security systems, IoT gateways, and nearly all devices that require flexible computing capabilities in compact packages.

The functionality, flexibility, and ease of use of SoC devices make them particularly valuable for edge AI. Developers can use familiar tools to develop applications running on SoC devices, and SoC has enough memory and processing power to run complex algorithms, such as relatively large deep learning models. Few edge AI algorithms cannot run on SoC devices. The ease of use makes SoC an excellent choice for prototyping edge AI applications, even if the ultimate goal is to transition to cheaper or more efficient hardware.

Notable SoC products include Qualcomm Snapdragon and Broadcom BCM58712, which can be used with Raspberry Pi development boards. Many popular SoC devices are based on Arm Cortex-A processor cores.

Mentioning SoC brings to mind embedded Linux, which has become a very common operating system for SoC devices. It is open-source, free to use, and has a lot of community support. Those familiar with using Unix development tools and with experience in Unix can easily work with embedded Linux systems.

Deep Learning Accelerators

Both MCUs and SoC are general-purpose computers designed to be as flexible as possible. However, if one is willing to sacrifice some flexibility, integrated circuits can be designed to run certain operations at extremely high speeds, such as Deep Learning Accelerators.

As deep learning increasingly appears in embedded devices, semiconductor companies have begun producing accelerators that can pair with microcontrollers and SoC to make deep learning models run faster and more efficiently. The mathematics of deep learning is based on linear algebra, so deep learning accelerators are designed to efficiently execute linear algebra, also known as Neural Processing Units (NPU).

There are currently various types of deep learning accelerators, which have some trade-offs between power consumption and flexibility. On one hand, devices like Syntiant‘s NDP10x series have hardware implementations of specific deep learning model architectures that can run quickly with very low power consumption. Since the algorithms themselves are built into the silicon, these devices are not very flexible, but they can be extremely efficient.

On the other hand, devices based on Graphics Processing Unit (GPU) technology, such as Nvidia‘s Jetson and Google‘s Coral, offer tremendous flexibility and can essentially run any type of deep learning model. The cost of this flexibility is that they are far less energy-efficient.

There are many different types of devices between the two, with varying degrees of flexibility and efficiency, such as Syntiant‘s NDP120 or Arm‘s Ethos-U55 architecture.

Some accelerators use traditional deep learning mathematics as an alternative. For example, BrainChip has launched Akida, which is known as a neuromorphic processor that uses spiking neural networks to provide a unique set of trade-offs, including higher energy efficiency.

Deep learning accelerators are often very fast, with enough computational power to process audio and video in real-time. Some devices can even process multiple data streams in parallel.

Typically, deep learning accelerators work in conjunction with MCUs or SoC devices. Traditional processors run application logic, while accelerators handle deep learning workloads. Many designs combine microprocessors and accelerators in a single package and provide special tools to help developers split processing between them.

Early deep learning accelerators had almost no freedom of choice in the types of deep learning models supported, but as the field matures, devices are becoming increasingly flexible. The field is still in a very early stage, so significant advancements and efficiency improvements can be expected over time. In the long run, powerful devices with extremely low power budgets may emerge, capable of real-time video processing or language transcription, running for years on a small battery.

In summary, this article introduces the characteristics of SoC and deep learning accelerators and their applications in edge AI.

The series of articles on common hardware for edge AI is as follows:

Common Hardware for Edge AI: MCUs and DSPs

Related posts

Leave a Comment Cancel reply