Building a Deep Learning Camera: Create a Deep Learning Camera with Raspberry Pi and YOLO

Python Tribe (python.freelycode.com) organized the translation, reproduction is prohibited, welcome to forward.

Amazon has just released a smart camera called DeepLens, which uses machine learning to detect objects, faces, and some activities, such as playing the guitar. Although DeepLens is not available for purchase yet, the idea of a smart camera is exciting.

Imagine being able to tell whether you are playing the guitar, creating a new dance move, or learning a new skateboarding trick with just one camera. The camera can use raw image data to determine whether you are performing a new skateboarding trick. Or if you are dancing a new routine, what those dance moves are and how to coordinate the movements with the music.

This article will build a deep learning camera that detects when a bird appears in the camera’s image and then saves the photo of the bird. The final result is shown in the image below:

Images containing birds detected by the deep learning camera

The deep learning camera is the start of a new platform for machine learning.

DeepLens has a computing power of 100 GFlops for inference, which is just the minimum computing power required for an interesting deep learning camera. In the future, these devices will become more powerful and allow for the inference of hundreds of images per second.

But who wants to wait for the future?

The “Dumb” Camera with Smart Inference

We do not need to build a deep learning model inside the camera, but rather use a “dumb” computer camera (like the $9 Raspberry Pi) connected to a network camera, sending images via WiFi. With a bit of latency trade-off, we can build a prototype today that is conceptually the same as DeepLens, and cheaper.

This article will construct such a camera. A web server written in Python will send images from the Raspberry Pi to another computer for inference or image detection.

Then, another more powerful computer will use the “YOLO” neural network structure to detect the input images and tell us whether there are birds in the camera.

We will start with the YOLO structure because it is one of the fastest detection models. It also has an interface for Tensorflow, which is easy to install and can run on many different platforms. If you use the small module used in this article, you can also detect on the CPU without needing an expensive GPU.

Back to our prototype. If the camera detects a bird, it saves that image for later analysis.

This is just the beginning of a truly smart deep learning camera, very basic, but you have to start somewhere. Let’s start with the first version of our prototype.

Detection vs Image

Detecting and marking birds

As mentioned earlier, DeepLens imaging is built into the computer. Therefore, it can perform baseline-level detection and determine whether the image meets your criteria using onboard computing power.

However, devices like the Raspberry Pi may not necessarily have the computing power required for real-time onboard detection. Therefore, we will use another computer to infer what is in the image.

In this article, I use a simple Linux computer with a camera and WiFi access (Raspberry Pi 3 and a cheap camera) to serve as the server for deep learning inference.

This is great because it allows connecting many cheap external cameras and doing all the computations on a desktop.

Camera Image Server Stack

If you do not want to use the Raspberry Pi camera, you can follow these instructions to install OpenCV 3 on the Raspberry Pi, and to get the latest version, install OpenCV 3.3.1 locally instead of version 3.3.0.

As a side note, to use version 3.3.1 on my Raspberry Pi, I had to disable CAROTENE. You might have to do the same.

After that, we need to set up a web server using Flask so we can load images from the camera.

This article initially uses Miguel Grinberg’s classic camera server code and creates a simple JPEG instead of a video:

If you want to use the Raspberry Pi camera, make sure to uncomment the from camera_pi line and comment out the from camera_opencv line.

You can run the server using python3 app.py or gunicorn, as mentioned in Miguel’s post.

When there are no requests, it uses Miguel’s classic camera management mechanism to shut down the camera, and it also uses this management mechanism to manage threads when multiple machines are inferring images captured by the camera.

Once started on the Raspberry Pi, first find its IP address, then access it through a web browser to test and ensure the server is working.

The URL is similar to http://192.168.1.4:5000/image.jpg:

Loading the Raspberry Pi page to confirm the image server is working

Extracting Images from the Camera Server and Inferring

Now, we can load the current image from the camera. We can build a script to grab these images and infer them.

We will use requests along with Darkflow; requests is a powerful Python library for fetching files from URLs, and Darkflow is an implementation of the YOLO module on Tensorflow.

Unfortunately, Darkflow cannot be installed via pip, so we need to copy the repo and build and install it on the computer we will use for inference.

After installing the Darkflow repo, we need to download weights and models for the YOLO version we will use.

In this article, I am using the tiny version of YOLO v2 because I want to run inference on a slower computer that uses onboard CPU instead of GPU. The tiny version has lower accuracy than the full YOLO v2 model.

After that, we also need to install Pillow, numpy, and OpenCV on the detection computer.

Finally, write code to run the detection:

This is our very basic first version of the detection running. We can see what the Raspberry Pi is detecting on the console and also see each bird it sees saved on the hard drive.

After that, run a program to mark the birds detected by YOLO on the images.

Trade-offs: More False Positives or More Misses?

It is important to note the threshold key created in the options dictionary.

This threshold indicates what level of confidence we need to identify the things we are looking for.

For testing, I set it to 0.1. However, this low threshold produces a lot of false positives. Worse, the tiny YOLO module we are using for detection is less accurate than the actual YOLO module, so we get quite a few false detections.

Lowering or raising the threshold can increase or decrease the total output of the module, depending on what you want to build. This article has more false positives, but I prefer to get more images of birds. You may need to adjust these parameters to meet your needs.

Waiting for Birds

It took a long time for birds to come to my feeder. I thought I would have to put food in the feeder in my backyard for a few hours.

But it took several days. Squirrels kept eating the food I put out, and for the first few days, I hardly saw a single bird in the sky.

Eventually, I made a second feeder that was more obvious and further from the ground. This time, I finally got some images like the ones at the beginning of the article.

Finally

As always, the code for this article can be found on Github: https://github.com/burningion/poor-mans-deep-learning-camera

Original English text: https://www.makeartwithpython.com/blog/poor-mans-deep-learning-camera/ Translator: Zhang Xinying