This real world poses many challenges, such as limited data and the inability to run complex deep learning models on microcomputer hardware (like phones and Raspberry Pi). This article demonstrates how to use the Raspberry Pi for object detection, like cars on the road, oranges in the fridge, signatures on documents, and Tesla in space.
Disclaimer: I am building nanonets.com to help build machine learning models with very little data and no computer hardware.
If you are eager, please scroll directly to the bottom of this article to access the Github repository.
Why Object Detection? Why Raspberry Pi?
The Raspberry Pi is flexible computer hardware, having sold 15 million units, capturing the hearts of a generation of consumers, and hackers have built many cool projects on it. Given the popularity of deep learning and Raspberry Pi cameras, we thought it would be great to use deep learning on Raspberry Pi to detect any object.
Now you can detect photo bombs in your selfies, someone entering Harambe’s cage, where the hot sauce is, or Amazon delivery people entering your house.
What is Object Detection?
20 million years of evolution has led to a highly evolved human visual system. The human brain has 30% of neurons responsible for processing visual information (compared to only 8% for touch and 3% for hearing). Compared to machines, humans have two major advantages: stereoscopic vision and an almost unlimited supply of training datasets (a five-year-old can gather about 2.7B image data at a sampling interval of 30fps).
To mimic human-level performance, scientists break down visual perception tasks into four different categories: 1. Classification, assigning a label to an image. 2. Localization, assigning a bounding box to a specific label. 3. Object detection, drawing multiple bounding boxes in an image. 4. Image segmentation, obtaining the precise location of an object in an image.
Object detection is good enough for many applications (image segmentation is more accurate but is influenced by the complexity of creating training data. It typically takes human annotators 12 times longer to segment an image than to draw bounding boxes). Furthermore, after detecting an object, it can be individually segmented out from the bounding box.
Using Object Detection:
Object detection has significant real-world implications and has been widely applied across various industries. Here are some examples:
How Can I Use Object Detection to Solve My Problems?
Object detection can be used to solve a variety of problems. Here is a summary classification: 1. Is the object present in my image? For example, is there an intruder in my house? 2. Where is a specific object in the image? For example, knowing the location of an object is crucial when a car is trying to navigate around the world. 3. How many objects are in the image? Object detection is one of the most effective methods for counting objects. For example, how many boxes are on the warehouse shelves? 4. What different types of objects are in the image? For example, which areas of the zoo have which animals? 5. How large is the object? Especially with static cameras, it is easy to calculate the size of objects. For example, what is the size of a mango? 6. How do objects interact with each other? For example, how does the formation on a soccer field affect the outcome of the game? 7. What are the positions of objects over time (tracking an object)? For example, tracking an object like a train and calculating its speed.
Complete Object Detection in 20 Lines of Code
Visualization of YOLO Algorithm
There are various models or architectures for object detection. Each weighs speed, size, and accuracy. We chose the most popular one: YOLO (You Only Look Once), and demonstrate how it works in 20 lines of code (ignoring comments).
Note: This is pseudocode, not a directly working instance. It has a very standard CNN structure as shown in the figure below:
You can read the full (YOLO) here: https://pjreddie.com/media/files/papers/yolo_1.pdf
The code architecture of using convolutional neural networks in YOLO is less than 20 lines, as follows:
#this is an Image of size 140x140. We will assume it to be black and white (ie only one channel, it would have been 140x140x3 for rgb)
image = readImage()
#We will break the Image into 7 coloumns and 7 rows and process each of the 49 different parts independently
NoOfCells = 7
#we will try and predict if an image is a dog, cat, cow or wolf. Therfore the number of classes is 4
NoOfClasses = 4
threshold = 0.7
#step will be the size of step to take when moving across the image. Since the image has 7 cells step will be 140/7 = 20
step = height(image)/NoOfCells
#stores the class for each of the 49 cells, each cell will have 4 values which correspond to the probability of a cell being 1 of the 4 classes
#prediction_class_array[i,j] is a vector of size 4 which would look like [0.5 #cat, 0.3 #dog, 0.1 #wolf, 0.2 #cow]
prediction_class_array = new_array(size(NoOfCells,NoOfCells,NoOfClasses))
#stores 2 bounding box suggestions for each of the 49 cells, each cell will have 2 bounding boxes, with each bounding box having x, y, w ,h and c predictions. (x,y) are the coordinates of the center of the box, (w,h) are it's height and width and c is it's confidence
predictions_bounding_box_array = new_array(size(NoOfCells,NoOfCells,NoOfCells,NoOfCells))
#it's a blank array in which we will add the final list of predictions
final_predictions = []
#minimum confidence level we require to make a prediction
threshold = 0.7
for (i<0; i<noofcells; i="i+1):" for="(j<0;=" 2=" 4=" 5=" 9=" 49=" j="j+1):" we="will=" get="each=" of=" size=" image=" no=" cell=" be=" first=" make=" a=" prediction=" as=" to=" what=" is=" the=" probability=" it=" being=" wolf=" vector=" which=" would=" look=" like=" 0.5=" 0.3=" 0.1=" 0.2=" this=" gives=" us=" our=" preidction=" different=" cells=" are=" class=" predictor=" neural=" network=" that=" has=" convolutional=" layers=" final=" predictions_bounding_box_array=" an=" array=" bounding=" boxes=" made=" box=" values=" y=" coordinates=" center=" whithin=" ranging=" between=" 0-20=" in=" your=" w=" height=" and=" width=" they=" extend=" outside=" range=" value=" c=" confidence=" overlap=" with=" acutal=" should=" predicted=" best_bounding_box="[0=" if=">predictions_bounding_box_array[i,j,1, 4] else 1]
# we will get the class which has the highest probability, for [0.5 #cat, 0.3 #dog, 0.1 #wolf, 0.2 #cow], 0.5 is the highest probability corresponding to cat which is at position 0. So index_of_max_value will return 0
predicted_class = index_of_max_value(prediction_class_array[i,j])
#we will check if the prediction is above a certain threshold (could be something like 0.7)
if predictions_bounding_box_array[i,j,best_bounding_box, 4] * max_value(prediction_class_array[i,j]) > threshold:
#the prediction is an array which has the x,y coordinate of the box, the height and the width
prediction = [predictions_bounding_box_array[i,j,best_bounding_box, 0:4], predicted_class]
final_predictions.append(prediction)
print final_predictions
How to Build a Deep Learning Model for Object Detection?
The workflow for deep learning has six basic steps, divided into three parts: 1. Collect training sets 2. Train the model 3. Predict new images
Phase 1 — Collect Training Data
Step 1. Collect images (at least 100 images for each object). For this task, you may need to collect 100 images for each object. Try to capture images that are more strongly correlated with the data that will be used for decision-making.
Step 2. Annotation (Manually draw bounding boxes on images)
Draw bounding boxes on images. You can use tools like labelImg. You usually need some people to annotate your images. This is a quite time-consuming task.
Phase 2 — Train the Model on a Machine with a GPU
Step 3. Find a pre-trained model for transfer learning:
You can read more about this at medium.com/nanonets/nanonets-how-to-use-deep-learning-when-you-have-limited-data-f68c0b512cab. To reduce the amount of data required for training, you need a pre-trained model. Without it, you might need an additional 100,000 images to train the model.
You can find many pre-trained models here.
Step 4. Train on a GPU (like AWS/GCP or your own GPU machine):
The process of training a model is hard to simplify, we created a docker image to make model training easier.
You can run the following code to start training the model:
sudo nvidia-docker run -p 8000:8000 -v `pwd`:data docker.nanonets.com/pi_training -m train -a ssd_mobilenet_v1_coco -e ssd_mobilenet_v1_coco_0 -p '{"batch_size":8,"learning_rate":0.003}'
For more details on how to use it, please refer to this link.
The docker image has a run.sh script that can be called with the following parameters:
run.sh [-m mode] [-a architecture] [-h help] [-e experiment_id] [-c checkpoint] [-p hyperparameters]
-h display this help and exit
-m mode: should be either `train` or `export`
-p key value pairs of hyperparameters as json string
-e experiment id. Used as path inside data folder to run current experiment
-c applicable when mode is export, used to specify checkpoint to use for export
You can find more detailed information at:
NanoNets/RaspberryPi-ObjectDetection-TensorFlow
To train the model, you need to select the correct hyperparameters.
Finding the Right Hyperparameters
The trick with “deep learning” is to calculate the best parameters that make the model most accurate. This is somewhat like black magic, but there are some theories. This is a good resource for finding the right parameters.
Quantizing the Model (Making the Model Smaller for Devices like Raspberry Pi and Phones)
Devices like phones and Raspberry Pi have very limited memory and computational capability.
By fine-tuning the weights of the neural network, this work typically requires these small increments to achieve floating-point precision (although there is also ongoing research into using quantized models representation).
Using a pre-trained model and running inference is very different. One of the magical advantages of deep neural networks is that they often handle high noise in the input better.
Why Quantize?
Neural network models occupy a large amount of disk space, for example, the original AlexNet model exceeds 200MB in floating-point format, and the space size is almost entirely determined by the weights connecting the neurons, as there are usually millions of weights in a simple model.
The nodes and weights of neural networks are originally stored as 32-bit floating-point numbers. One of the simplest ways to quantize models is to compress the file size by storing the minimum and maximum values of each layer, then compressing each floating-point number to 8-bit integers. This can reduce the file size by 75%.
Code for Quantizing the Model:
curl -L "https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz" |
tar -C tensorflow/examples/label_image/data -xz
bazel build tensorflow/tools/graph_transforms:transform_graph
bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
--in_graph=tensorflow/examples/label_image/data/inception_v3_2016_08_28_frozen.pb \
--out_graph=/tmp/quantized_graph.pb \
--inputs=input \
--outputs=InceptionV3/Predictions/Reshape_1 \
--transforms='add_default_attributes strip_unused_nodes(type=float, shape="1,299,299,3")
remove_nodes(op=Identity, op=CheckNumerics) fold_constants(ignore_errors=true)
fold_batch_norms fold_old_batch_norms quantize_weights quantize_nodes
strip_unused_nodes sort_by_execution_order'
Note: Our docker image has quantization built-in.
Phase 3: Predict New Images with Raspberry Pi
Step 5: Capture New Images with Camera
You need the Raspberry Pi camera to be turned on and working, then capture a new image.
For instructions on how to install, please click here
import picamera, os
from PIL import Image, ImageDraw
camera = picamera.PiCamera()
camera.capture('image1.jpg')
os.system("xdg-open image1.jpg")
Code for capturing a new image
Step 6: Predict a New Image
Download the Model
Once you have completed training the model, you can download it to your Raspberry Pi. Import the model and run the following code:
sudo nvidia-docker run -v `pwd`:data docker.nanonets.com/pi_training -m export -a ssd_mobilenet_v1_coco -e ssd_mobilenet_v1_coco_0 -c /data/0/model.ckpt-8998
Then download the model to your Raspberry Pi.
Install TensorFlow on Raspberry Pi
Depending on the device, you may need to change some installation methods.
sudo apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools libjpeg-dev
sudo pip install Pillow
sudo pip install http://ci.tensorflow.org/view/Nightly
git clone https://github.com/tensorflow/models.git
sudo apt-get install -y protobuf-compiler
cd models/research/
protoc object_detection/protos/*.proto --python_out=.
export PYTHONPATH=$PYTHONPATH:/home/pi/models/research:/home/pi/models/research/slim
Run the Model to Predict New Images
python ObjectDetectionPredict.py --model data/0/quantized_graph.pb --labels data/label_map.pbtxt --images /data/image1.jpg /data/image2.jpg
Performance Benchmark on Raspberry Pi
The Raspberry Pi is limited by memory and computational power (the TensorFlow version compatible with Raspberry Pi GPU is still unavailable). Therefore, it is important to measure the time each model takes to predict new images.Benchmark of different object detection models running on Raspberry Pi
Workflow with NanoNets:
One of our goals with NanoNets is to make our work easily integrated with deep learning. Object detection is a key area we focus on, and we have created a workflow to address some challenges in implementing deep learning models.
How Does NanoNets Simplify the Process?
1. No Need for Annotation We have eliminated the need for annotating images; we have professional annotators who can annotate images for you.
2. Automatic Optimal Model and Hyperparameter Selection We automatically train the best model for you by running a series of models with different parameters, then selecting the best one for your data.
3. No Need for Expensive Hardware and GPUs NanoNets runs completely in the cloud, without occupying any of your hardware. This makes it easier to use.
4. Great for Mobile Devices like Raspberry Pi Because devices like Raspberry Pi and phones are not suitable for running complex computational tasks, you can offload your workload to our cloud and let us handle all the computations.
Here is a simple snippet using the NanoNets API for image prediction
import picamera, json, requests, os, random
from time import sleep
from PIL import Image, ImageDraw
#capture an image
camera = picamera.PiCamera()
camera.capture('image1.jpg')
print('captured image')
#make a prediction on the image
url = 'https://app.nanonets.com/api/v2/ObjectDetection/LabelFile/'
data = {'file': open('image1.jpg', 'rb'),
'modelId': ('', 'YOUR_MODEL_ID')}
response = requests.post(url, auth=requests.auth.HTTPBasicAuth('YOUR_API_KEY', ''), files=data)
print(response.text)
#draw boxes on the image
response = json.loads(response.text)
im = Image.open("image1.jpg")
draw = ImageDraw.Draw(im, mode="RGBA")
prediction = response["result"][0]["prediction"]
for i in prediction:
draw.rectangle((i["xmin",i["ymin", i["xmax",i["ymax"), fill=(random.randint(1, 255),random.randint(1, 255),random.randint(1, 255),127))
im.save("image2.jpg")
os.system("xdg-open image2.jpg")
Code for image prediction using NanoNets
Build Your Own NanoNet
You can try building your own model: 1. Use a GUI (also automatically annotate images): https://nanonets.com/objectdetection/ 2. Use our API: https://github.com/NanoNets/object-detection-sample-python
Step 1: Clone this Repository
git clone
cd object-detection-sample-python
sudo pip install requests
Step 2: Get Your Free API Key
Get your free API key from here http://app.nanonets.com/user/api_key
Step 3: Set the API Key as an Environment Variable
export NANONETS_API_KEY=YOUR_API_KEY_GOES_HERE
Step 4: Create a New Model
python ./code/create-model.py
Note: This will generate the MODEL_ID you will need in the next step
Step 5: Add the Model ID as an Environment Variable
export NANONETS_MODEL_ID=YOUR_MODEL_ID
Step 6: Upload Training Data
Collect images of the objects you need to detect. You can annotate them either through our website UI (https://app.nanonets.com/ObjectAnnotation/?appId=YOUR_MODEL_ID) or using open-source tools like labelImg. Once you have your folder ready.
Prepare the data, images (image files), annotations (annotations of the image files), and start uploading the dataset.
python ./code/upload-training.py
Step 7: Train the Model
Once the images are uploaded, start training the model
python ./code/train-model.py
Step 8: Get Model Status
This model will take about 2 hours to train. Once the model starts training, you will receive an email and can check the status of the model.
watch -n 100 python ./code/model-state.py
Step 9: Make Predictions
Once the model training is complete, you can use the model for predictions
python ./code/prediction.py PATH_TO_YOUR_IMAGE.jpg
Code (Github Repository)
GitHub repository for training models: Tensorflow Code for model Training and Quantization NanoNets Code for model Training
GitHub repository for predictions on Raspberry Pi (for detecting new objects): Tensorflow Code for making Predictions on the Raspberry Pi NanoNets Code for making Predictions on the Raspberry Pi
Annotated datasets: Cars on Indian Roads sees, dataset for extracting vehicles from Images of Indian Roads Coco Dataset
via gair.link/page/TextTranslation/904
Initiated by: Jiang Fanli, Proofread by: Lao Zhao, Reviewed by: Lao Zhao
Participated in translation (2 people): Xiao Ge, Baboon
Links in the text can be clicked to read the original text at the end
More Exciting Content
DIY a Smart Lock with Arduino
Hands-on Guide to “Producing” Arduino Nano
DIY a WI-FI Remote-Controlled Boat with ESP32 Development Board
DIY a Hot Equipment: Flame Throwing Gauntlet
Let’s Make a Fantastical “Infinitely Extending” Mirror
TensorFlow 1.9 Officially Supports Raspberry Pi Now!
Leave a Comment
Your email address will not be published. Required fields are marked *