Quickly Deploy TinyML on MCUs

Are you curious about Artificial Intelligence (AI) and Machine Learning (ML)? Do you want to know how to use it on the MCU you have already worked with? In this article, we will introduce you to Machine Learning on MCUs. This topic is also known as Tiny Machine Learning (TinyML). Get ready to lose to the ESP-EYE development board in a game of rock-paper-scissors. You will learn about data collection and processing, how to design and train AI, and how to get it running on an MCU. This example provides you with everything you need to complete your own TinyML project from start to finish.

Why Should I Care About TinyML?

You’ve probably heard of tech companies like DeepMind and OpenAI. They dominate the ML field with expert and GPU capabilities. To give a sense of scale, the best AI, like that used by Google Translate, requires months of training. They parallel hundreds of high-performance GPUs. TinyML slightly turns the tide by shrinking down. Large AI models are not suitable for MCUs due to memory constraints. The image below shows the differences in hardware requirements.

What advantages does ML on MCUs have compared to using AI services in the cloud? We have identified seven main advantages:

Cost: The purchase and operating costs of MCUs are low.
Environmentally Friendly: Running AI on MCUs consumes very little energy.
Integration: MCUs are easy to integrate into existing environments, such as production lines.
Privacy and Security: Data can be processed locally on the device. Data does not have to be sent over the internet.
Rapid Prototyping: TinyML allows you to develop proof-of-concept solutions in a short time.
Autonomous and Reliable: Tiny devices can be used anywhere, even without infrastructure.
Real-Time: Data is processed on the MCU with no latency. The only limitation is the processing speed of the MCU.

Rock-Paper-Scissors

Have you ever lost in a game of rock-paper-scissors against AI? Or do you want to impress your friends by defeating artificial intelligence? You will use TinyML to compete against the ESP-EYE development board. To make such a project possible, you need to learn five steps. The following section provides a high-level overview of the necessary steps. If you want to take a closer look, refer to the documentation in our project repository. It explains the practical details.

Data Collection

Collecting data is a crucial part of ML. To get things running, you need to take images of your hand forming the rock-paper-scissors gestures. The more unique the images, the better. The AI will learn that your hand can be in different angles, positions, or lighting conditions. The dataset contains the recorded images and the labels for each image. This is known as supervised learning.

It’s best to use the same sensors and environment for running the AI as you did for training it. This ensures that the model is familiar with the data being fed into it. For instance, due to manufacturing variations, temperature sensors can have different voltage outputs for the same temperature. For our purposes, this means that recording images with the ESP-EYE camera on a uniform background is ideal. During deployment, the AI will perform best in similar backgrounds. You can also use a webcam to record images, but you might sacrifice some accuracy. Due to the limited capacity of the MCU, we will record and process grayscale images of 96×96 pixels.

After collecting the data, it’s crucial to split the data into a training set and a testing set. We do this to understand how our model recognizes gesture images it has never seen before. The model will naturally perform well on images it has seen during training.

Here are some example images. If you don’t want to collect data right now, you can download our ready-made dataset here.

Data Preprocessing

Identifying patterns in data is not only difficult for humans. To make it easier for the AI model, preprocessing algorithms are often relied upon. In our dataset, we recorded images using the ESP-EYE and a webcam. Since the ESP-EYE can capture grayscale images at a resolution of 96×96, we do not need to do much further processing here. However, we need to resize and crop the webcam images to 96×96 pixels and convert them from RGB format to grayscale format. Finally, we need to normalize all the images. The image below shows the intermediate steps we processed.

Designing the Model

Designing a model is very tricky! Detailed processing is beyond the scope of this article. We will describe the basic components of the model and how we designed our model. Behind the scenes, our AI relies on neural networks. A neural network can be thought of as a collection of neurons, somewhat like our brains. This is why AI can also be eaten by zombies in a zombie apocalypse.

When all the neurons in the network are interconnected, it is called fully connected or dense. This can be considered the most basic type of neural network. Since we want our AI to recognize gestures from images, we used a more advanced and suitable type of image processing called Convolutional Neural Network (CNN). Convolution reduces the dimensionality of the image, extracts important patterns, and retains the local relationships between pixels. To design the model, we used the TensorFlow library, which provides ready-made neural network components called layers to easily create neural networks!

Creating a model means stacking layers. The correct combination of them is crucial for developing a robust and high-accuracy model. The image below shows the different layers we are using. Conv2D represents a convolutional layer. The BatchNormalization layer applies a form of normalization to the output of the previous layer. Then we feed the data into the activation layer, which introduces non-linearity and filters out unimportant data points. Next, max pooling is similar to convolution to reduce the size of the image. This layer block repeats several times, with the appropriate number determined by experience and experimentation. Afterwards, we use a flatten layer to reduce the two-dimensional image to a one-dimensional array. Finally, this array is tightly connected to three neurons representing the rock-paper-scissors classes.

def make_model_simple_cnn(INPUT_IMG_SHAPE, num_classes=3):

inputs = keras.Input(shape=INPUT_IMG_SHAPE)

x = inputs

x = layers.Rescaling(1.0 / 255)(x)

x = layers.Conv2D(16, 3, strides=3, padding=”same”)(x)

x = layers.BatchNormalization()(x)

x = layers.Activation(“relu”)(x)

x = layers.MaxPooling2D()(x)

x = layers.Conv2D(32, 3, strides=2, padding=”same”, activation=”relu”)(x)

x = layers.MaxPooling2D()(x)

x = layers.Conv2D(64, 3, padding=”same”, activation=”relu”)(x)

x = layers.MaxPooling2D()(x)

x = layers.Flatten()(x)

x = layers.Dropout(0.5)(x)

outputs = layers.Dense(units=num_classes, activation=”softmax”)(x)

return keras.Model(inputs, outputs)

Training the Model

Once we have designed a model, we can train it. Initially, the AI model will make random predictions. Predictions are the probabilities associated with the labels, in our case, rock, paper, or scissors. Our AI will tell us how likely it thinks an image is for each label. Since the AI is guessing the labels at first, it often gets the labels wrong. Training occurs after comparing the predicted labels with the actual labels. The prediction error leads to updates between the neurons in the network. This form of learning is called gradient descent. Because our model is built in TensorFlow, training is as simple as one, two, three. Below, you can see the output generated during training—higher accuracy (training set) and validation accuracy (testing set) are better!

Epoch 1/6

480/480 [==============================] – 17s 34ms/step – loss: 0.4738 – accuracy: 0.6579 – val_loss: 0.3744 – val_accuracy: 0.8718

Epoch 2/6

216/480 [============>……………..] – ETA: 7s – loss: 0.2753 – accuracy: 0.8436

During training, various issues may arise. The most common problem is overfitting. As the model encounters the same examples over and over, it starts to memorize the training data rather than learning the underlying patterns. Of course, we remember that understanding is better than memorizing from school! At some point, the accuracy on the training data may continue to rise while the accuracy on the testing set does not. This is a clear indicator of overfitting.

Converting the Model

After training, we have an AI model in TensorFlow format. Since the ESP-EYE cannot interpret this format, we will convert the model to a microprocessor-readable format. We start by converting it to a TfLite model. TfLite is a more compact TensorFlow format that uses quantization to reduce the size of the model. TfLite is typically used in edge devices around the world, such as smartphones or tablets. The final step is to convert the TfLite model to a C array, as the MCU cannot directly interpret TfLite.

Deploying the Model

Now we can deploy our model onto the microprocessor. All we need to do is place the new C array into the expected file. Replace the contents of the C array, and don’t forget to replace the array length variable at the end of the file. We provide a script to simplify this process.

Embedded Environment

Let’s review what happens on the MCU. During setup, the interpreter is configured for the shape of our images.

// initialize interpreter

static tflite::MicroInterpreter static_interpreter(

model, resolver, tensor_arena, kTensorArenaSize, error_reporter);

interpreter = &static_interpreter;

model_input = interpreter->input(0);

model_output = interpreter->output(0);

// assert real input matches expect input

if ((model_input->dims->size != 4) || // tensor of shape (1, 96, 96, 1) has dim 4

(model_input->dims->data[0] != 1) || // 1 img per batch

(model_input->dims->data[1] != 96) || // 96 x pixels

(model_input->dims->data[2] != 96) || // 96 y pixels

(model_input->dims->data[3] != 1) || // 1 channel (grayscale)

(model_input->type != kTfLiteFloat32)) { // type of a single data point, here a pixel

error_reporter->Report(“Bad input tensor parameters in model\n”);

return;

}

Once the setup is complete, captured images are sent to the model, and predictions about the gestures are made.

// read image from camera into a 1-dimensional array

uint8_t img[dim1*dim2*dim3]

if (kTfLiteOk != GetImage(error_reporter, dim1, dim2, dim3, img)) {

TF_LITE_REPORT_ERROR(error_reporter, “Image capture failed.”);

}

// write image to model

std::vector<uint8_t> img_vec(img, img + dim1*dim2*dim3);

std::vector<float_t> img_float(img_vec.begin(), img_vec.end());

std::copy(img_float.begin(), img_float.end(), model_input->data.f);

// apply inference

TfLiteStatus invoke_status = interpreter->Invoke();

}

The model then returns the probabilities for each gesture. Since the probability array is just a series of values between 0 and 1, some interpretation is needed. We consider the recognized gesture to be the one with the highest probability. Now we process the interpretation by comparing the recognized gesture with the AI’s action to determine who won this round. You didn’t stand a chance!

// probability for each class

float paper = model_output->data.f[0];

float rock = model_output->data.f[1];

float scissors = model_output->data.f[2];

The image below illustrates the steps on the MCU. For our purposes, no preprocessing is required on the MCU.

Expand the Example

How about a challenge? Want to achieve new life goals? Or impress old friends or find new ones? By adding lizard and Spock, you can take rock-paper-scissors to the next level. Your AI friend will have a skill closer to world domination. First, you should check out our rock-paper-scissors knowledge base and be able to replicate the steps above. The README file will help you understand the details. The image below shows you how the game operates. You need to add two extra gestures and some new win-lose conditions.

Start Your Own Project

If you liked this article and want to start your own project, we will provide you with a template project that uses the same simple pipeline as our rock-paper-scissors project. You can find the template here. Don’t hesitate to show us your project through social media. We would love to know what you can create!

You can find more information about TinyML in Pete Warden’s book, which is a great resource.

Nikolas Rieder is a student at Hamburg University of Applied Sciences. He is pursuing a bachelor’s degree in Mechatronics and has been doing a mandatory internship at Itemis AG since February 2022. He works with co-authors in the field of TinyML, combining his passion for AI with his expertise in embedded systems. Nikolas is a lifelong learner, curious about future technologies that improve everyday life.

Author: Nikolas Rieder and Rafael Tappe Maestro, Source: Embedded

Original Reference: How to quickly deploy TinyML on MCUs, compiled by Franklin Zhao.

END

Top 10

Technology Trends to Watch in 2023

Click the video👇to watch