Building an AI Camera with Python, Raspberry Pi, and YOLO

Not long ago, Amazon just launched DeepLens. This is the world’s first camera designed for developers that supports deep learning. The machine learning algorithms it uses can detect not only object movements and facial expressions but also complex activities like playing the guitar. Although DeepLens has not officially hit the market, the concept of smart cameras has already been born.

Today, we will build a deep learning-based camera that can detect birds when they appear in the camera’s view and automatically take pictures. The final product’s captured images are shown below:

The camera is not dumb; it can be quite clever

We do not intend to integrate a deep learning module into the camera; instead, we plan to hook the Raspberry Pi to the camera and send photos via WiFi. Based on the principle of “keeping it simple” (or cheap), we only plan to create a concept prototype similar to DeepLens today, and interested readers can try it themselves.

Next, we will use Python to write a web server that the Raspberry Pi will use to send photos to the computer or perform behavior inference and image detection.

The computer we are using has more processing power, and it will use a neural network architecture called YOLO to detect the input image and determine whether a bird has appeared in the camera’s view.

We need to start with the YOLO architecture because it is one of the fastest detection models currently available. This model has an interface specifically for Tensorflow (Google’s second-generation artificial intelligence learning system developed based on DistBelief), allowing us to easily install and run this model on different platforms. A friendly reminder: if you use the mini model we are using in this article, you can also perform detection using the CPU instead of relying solely on the expensive GPU.

Next, back to our concept prototype… If a bird is detected within the frame, we will save the image and proceed to the next analysis.

Detection and Photography

As we mentioned, the photography function of DeepLens is integrated into the computer, so it can directly use the onboard computing power for baseline detection and determine whether the image meets our standards.

However, with something like the Raspberry Pi, we actually do not need to use its computing power for real-time calculations. Therefore, we plan to use another computer to infer the content present in the image.

I am using a simple Linux computer with a camera and WiFi wireless card (Raspberry Pi 3+ camera), and this simple device will serve as my deep learning machine for image inference. For me, this is currently the ideal solution, as it significantly reduces my costs and allows me to complete all calculations on my desktop.

Of course, if you do not want to use the Raspberry Pi video camera, you can also choose to install OpenCV 3 on the Raspberry Pi as an alternative, for which you can refer to 【this document】. A friendly reminder: the installation process can be quite troublesome!

Next, we need to use Flask to set up a web server so that we can obtain images from the camera. Here I used the webcam server code developed by Miguel Grinberg (Flask video streaming framework) and created a simple jpg endpoint:

If you are using the Raspberry Pi video camera, please ensure that the line from camera_pi in the code above is not commented out, and comment out the line from camera_opencv.

You can run the server directly using the command python3 app.py or gunicorn, which is the same as what Miguel wrote in the documentation. If we use multiple computers for image inference, we can also utilize the camera management solution developed by Miguel to manage the cameras and computing threads.

When we start the Raspberry Pi, we first need to check whether the server is working properly based on the IP address and then try to access the server through a web browser.

The URL address format is as follows:

http://192.168.1.4:5000/image.jpg

Load the web page and image in the Raspberry Pi to determine whether the server is functioning correctly:

Image Import and Inference

Now that we have set up the endpoint to load the current image content from the camera, we can build a script to capture images and infer the content within them.

Here we need to use the request library (an excellent Python library for retrieving file resources from URL addresses) and Darkflow (an implementation of the YOLO model based on Tensorflow).

Unfortunately, we cannot use pip or similar methods to install Darkflow, so we need to clone the entire code repository and build and install the project ourselves. After installing the Darkflow project, we also need to download a YOLO model.

Since I am using a slower computer with an onboard CPU (rather than a faster GPU), I chose to use the YOLO v2 mini network. Of course, its functionality is definitely not as accurate as the complete YOLO v2 model!

After configuration is complete, we also need to install Pillow, numpy, and OpenCV on the computer. Finally, we can thoroughly complete our code and perform image detection.

The final code is shown below:

At this point, we can not only see the content detected by the Raspberry Pi in the command console but also directly view the saved bird photos on the hard drive. Next, we can use YOLO to label the birds in the images.

Balancing False Positives and False Negatives

In the options dictionary of our code, we set a threshold key, which represents a certain success rate we use for detecting images. During testing, we set it to 0.1, but such a low threshold will lead to higher false positives and misreporting rates. Worse still, the accuracy of the mini YOLO model we are using is far inferior to that of the complete YOLO model, but this is also a balancing factor to consider.

Lowering the threshold means we can obtain more model outputs (photos). In my testing environment, I set the threshold relatively low because I wanted to obtain more bird photos, but everyone can adjust the threshold parameter according to their needs.

Open Source Code

As before, I have uploaded all the code to GitHub, and interested readers can click to read the original text to download it.

* Source: makeartwithpython, FB editor Alpha_h4ck compiled, please indicate that it comes from FreeBuf.COM

The camera is not dumb; it can be quite clever

Detection and Photography

Image Import and Inference

Balancing False Positives and False Negatives

Open Source Code

Related posts

Leave a Comment Cancel reply