Building a License Plate Recognition System with Tesla and Raspberry Pi

Source: Machine Heart | Author: Robert Lucian Chiriac | Contributors: Wang Zijia, Si, Yi Ming

How to build an intelligent vehicle system without changing cars? For some time, the author Robert Lucian Chiriac has been thinking about giving cars the ability to detect and recognize objects. This idea is very interesting because we have already seen Tesla’s capabilities, although it is not possible to buy a Tesla right away (it must be mentioned that the Model 3 is becoming increasingly attractive), but he has an idea to work hard to achieve this dream.

So, the author used Raspberry Pi to achieve this, which can detect license plates in real-time when placed on the car.

In the following content, we will introduce each step in the project and provide the GitHub project address, where the project address is just the client tool, and other datasets and pre-trained models can be found at the end of the original blog.

Project address:

https://github.com/RobertLucian/cortex-license-plate-reader-client

Now, let’s see how the author Robert Lucian Chiriac built a useful onboard detection and recognition system step by step.

Here is a finished product image.

Step 1:Define the Project Scope

Before starting, the first question that came to my mind was what this system should be able to do. If I have learned anything in my life, it is that taking things step by step is always the best strategy. So, besides the basic visual tasks, all I need is to clearly recognize license plates while driving. This recognition process includes two steps:

Detect the license plate.
Recognize the text within each license plate bounding box.

I think if I can accomplish these tasks, it will be much easier to do other similar tasks (such as determining collision risks, distances, etc.). I might even be able to create a vector space to represent the surrounding environment – it sounds cool just to think about it.

Before determining these details, I knew I had to do the following:

A machine learning model that takes unmarked images as input to detect license plates;
Some hardware. Simply put, I need a computer system connected to one or more cameras to invoke my model.

Let’s start with the first thing – building an object detection model.

Step 2:Select the Right Model

After careful research, I decided to use these machine learning models:

YOLOv3 – This is one of the fastest models available today and has a comparable mAP to other SOTA models. We use this model to detect objects;
CRAFT Text Detector – We use it to detect text in images;
CRNN – Simply put, it is a recurrent convolutional neural network model. To arrange the detected characters into words in the correct order, it must be sequential data;

How do these three models work together? The following describes the operational flow:

First, the YOLOv3 model receives frames of images from the camera and finds the bounding boxes of the license plates in each frame. It is not advisable to use very precise predicted bounding boxes – it is better for the bounding boxes to be slightly larger than the actual detected objects. If they are too tight, it may affect the performance of subsequent processes;
The text detector receives the cropped license plates from YOLOv3. At this point, if the bounding box is too small, it is likely that part of the license plate text has also been cropped, which would result in a poor prediction. However, when the bounding box is enlarged, we can allow the CRAFT model to detect the positions of the letters, allowing for very accurate positioning of each letter;
Finally, we can pass the bounding boxes of each word from CRAFT to the CRNN model to predict the actual words.

With my basic model architecture sketch, I can start transitioning to hardware.

Step 3:Design Hardware

When I realized I needed low-power hardware, I thought of my old love: Raspberry Pi. Because it has a dedicated camera, the Pi Camera, and enough computing power to preprocess each frame at a decent frame rate. The Pi Camera is the physical camera for Raspberry Pi and has a mature and complete library.

To connect to the internet, I could use the 4G access of EC25-E, which I had also used in a previous project with its GPS module; for details, see:

Blog address: https://www.robertlucian.com/2018/08/29/mobile-network-access-rpi/

Then I would start designing the housing – it should be fine to hang it on the car’s rearview mirror, so I ultimately designed a support structure divided into two parts:

On the rearview mirror side, the Raspberry Pi + GPS module + 4G module will be retained. You can check my article about the EC25-E module for the GPS and 4G antennas I used;
On the other side, I used an arm with a ball joint to support the Pi Camera.

I will use my reliable Prusa i3 MK3S 3D printer to print these parts, and I also provide the 3D printing parameters at the end of the original text.

Figure 1: Shape of the Raspberry Pi + 4G/GPS case

Figure 2:Using a ball joint arm to support the Pi Camera

Figures 1 and 2 show how they look when rendered. Note that the c-shaped bracket is pluggable, so the Raspberry Pi’s accessories and the support for the Pi Camera are not printed together with the bracket. They share a socket, with the bracket plugged into it. If any reader wants to replicate this project, this is very useful. They just need to adjust the bracket on the rearview mirror. Currently, this base works very well in my car (Land Rover Freelander).

Figure 3:Side view of the Pi Camera support structure

Figure 4:Front view of the Pi Camera support structure and RPi base

Figure 5:Expected camera field of view

Figure 6:Close-up of the embedded system with built-in 4G/GPS module and Pi Camera

Clearly, these things require some time to model, and I need to do it several times to get a solid structure. The PETG material I used has a layer height of 200 microns. PETG works well at 80-90 degrees Celsius and is very resistant to UV radiation – although not as good as ASA, it is still strong.

This was designed in SolidWorks, so all my SLDPRT/SLDASM files and all STLs and gcode can be found at the end of the original text. You can also use these to print your own version.

Step 4:Train the Model

Now that the hardware is solved, it’s time to start training the model. As everyone should know, working on the shoulders of giants as much as possible is key. This is the essence of transfer learning – learning from a very large dataset first and then utilizing the knowledge learned from it.

YOLOv3

I found many pre-trained license plate models online, but there weren’t as many as I initially expected, but I found one trained on 3600 license plate images. This training set is not large, but it is better than nothing. In addition, it was trained based on the pre-trained model of Darknet, so I could use it directly.

Model address: https://github.com/ThorPham/License-plate-detection

Since I already had a recordable hardware system, I decided to drive around town for a few hours to collect new video frame data to fine-tune the previous model.

I used VOTT to annotate those frames containing license plates, ultimately creating a small dataset of 534 images, all of which had marked bounding boxes for the license plates.

Dataset address: https://github.com/RobertLucian/license-plate-dataset

Then I found code for implementing YOLOv3 with Keras and used it to train my dataset, then submitted my model to this repo so that others could use it. I ultimately achieved an mAP of 90% on the test set, which is excellent considering my dataset is very small.

Keras implementation: https://github.com/experiencor/keras-yolo3
Submit merge request: https://github.com/experiencor/keras-yolo3/pull/244

CRAFT & CRNN

To find a suitable network for recognizing text, I went through countless attempts. In the end, I stumbled upon keras-ocr, which packages CRAFT and CRNN, is very flexible, and has pre-trained models, which is fantastic. I decided not to fine-tune the models and keep them as they are.

keras-ocr address: https://github.com/faustomorales/keras-ocr

Most importantly, predicting text with keras-ocr is very simple. It basically takes just a few lines of code. You can check the project homepage to see how this is done.

Step 5:Deploy My License Plate Detection Model

Model deployment mainly has two methods:

Perform all inference locally;
Perform inference in the cloud.

Both methods have their challenges. The first means having a central “brain” computer system, which is complex and expensive. The second faces challenges in latency and infrastructure, especially when using GPUs for inference.

In my research, I stumbled upon an open-source project called Cortex. It is a newcomer in the AI field, but it undoubtedly makes sense as the next development direction for AI development tools.

Cortex project address: https://github.com/cortexlabs/cortex

Basically, Cortex is a platform for deploying machine learning models as production web services. This means I can focus on my application while leaving the rest to Cortex to handle. It does all the preparation work on AWS, and all I need to do is write a predictor using template models. Even better, I only need to write a few dozen lines of code for each model.

Below is a terminal from the Cortex runtime obtained from the GitHub repo. If this doesn’t qualify as elegant and simple, I don’t know what to call it:

Because this computer vision system was not designed for autonomous driving, latency is not as critical for me, and I can use Cortex to address this issue. If it were part of an autonomous driving system, using services provided by cloud providers would not be a good idea, at least not now.

Deploying ML models with Cortex requires:

Defining a cortex.yaml file, which is our API configuration file. Each API will handle one type of task. I assigned the task of detecting license plate bounding boxes on a given frame to the yolov3 API, while the crnn API predicts license plate numbers with the help of the CRAFT text detector and CRNN;
Defining the predictor for each API. Basically, all you need to do is define a predict method for a specific class in Cortex to receive a payload (all the servy parts have already been handled by the platform), which can be used to predict results and then return the predictions. It’s that simple!

Here is a classic predictor example for the iris dataset, but due to the length of the article, I won’t go into the details here. You can find the methods for using these two APIs in the project link – all other resources for this project are at the end of this article.

# predictor.pyimport boto3
import pickle
labels = ["setosa", "versicolor", "virginica"]
class PythonPredictor:
    def __init__(self, config):
        s3 = boto3.client("s3")
        s3.download_file(config["bucket"], config["key"], "model.pkl")
        self.model = pickle.load(open("model.pkl", "rb"))    def predict(self, payload):
        measurements = [
            payload["sepal_length"],
            payload["sepal_width"],
            payload["petal_length"],
            payload["petal_width"],
        ]        label_id = self.model.predict([measurements])[0]
        return labels[label_id]

To make predictions, you just need to use curl like this:

curl http://***.amazonaws.com/iris-classifier \
    -X POST -H "Content-Type: application/json" \
    -d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}'

Then you will receive feedback like setosa, very simple!

Step 6:Develop the Client

With Cortex helping me with deployment, I can start designing the client – this is the more challenging part.

I thought of the following architecture:

Collect frames from the Pi Camera at an acceptable resolution (800×450 or 480×270) at a frame rate of 30 FPS and push each frame into a common queue;
In a separate process, I will take frames from the queue and distribute them to multiple workstations on different threads;
Each worker thread (or what I call an inference thread) will make API requests to my Cortex API. First, a request will go to my yolov3 API, and if any license plates are detected, another request will send a batch of cropped license plates to my crnn API. The predicted license plate numbers will be returned in text format;
Each detected license plate (with or without recognized text) will be pushed to another queue, ultimately broadcasting it to the browser page. At the same time, the predicted license plate numbers will be pushed to another queue to be saved to disk in CSV format later;
The broadcast queue will receive a set of unordered frames. The consumer’s task is to place them into a very small buffer (the size of a few frames) and broadcast a new frame to the client for reordering each time. This consumer runs in a separate process and must also try to keep the queue size fixed to a specified value to display frames at a consistent frame rate. Obviously, if the queue size drops, the frame rate will drop proportionally, and vice versa;
Meanwhile, another thread will run in the main process, pulling predictions and GPS data from another queue. When the client receives a termination signal, the predictions, GPS data, and time will also be dumped to a CSV file.

The following diagram illustrates the relationship flow between the client and the cloud API provided by Cortex.

Figure 7:Flowchart of the client based on the cloud API provided by Cortex

In our case, the client is the Raspberry Pi, and the cloud API to which inference requests are sent is provided by Cortex on AWS.

The source code for the client can also be found in its GitHub: https://github.com/robertlucian/cortex-licens-plate-reader-client

One challenge I had to overcome was the bandwidth of 4G. It’s best to reduce the bandwidth required by this application to minimize potential hang-ups or excessive use of available data. I decided to have the Pi Camera use a very low resolution: 480×270 (we can use a small resolution here because the Pi Camera has a very narrow field of view, so we can still easily recognize license plates).

However, even at this resolution, each frame’s JPEG size is about 100KB (0.8MBits). Multiply that by 30 frames per second and you get 3000KB, which is 24mb/s, and this is still a lot without HTTP overhead.

Therefore, I used some tricks:

Reduce the width to 416 pixels, which is the size required by the YOLOv3 model, and the scale is obviously lossless;
Convert the image to grayscale;
Remove the top 45% of the image. The idea here is that the license plate will not appear at the top of the frame because cars do not fly, right? As far as I know, removing 45% of the image does not affect the predictor’s performance;
Convert the image back to JPEG, but this time with much lower quality.

The resulting frame size is about 7-10KB, which is very good. That translates to about 2.8Mb/s. However, considering all the overhead, it’s about 3.5Mb/s. For the crnn API, the cropped license plates do not require much space; even without compression, they are only about 2-3KB each.

In summary, to run at 30FPS, the bandwidth required for the inference API is about 6Mb/s, which is an acceptable number for me.

Results

Success!

The above is an example of real-time inference through Cortex. I need about 20 GPU-equipped instances to run it smoothly. Depending on the latency of this group of GPUs, you may need more GPUs or fewer instances. The average latency from capturing frames to broadcasting them to the browser window is about 0.9 seconds, which is amazing considering the inference happens far away – I’m still amazed.

The text recognition part may not be the best, but it at least proves one point – it can be made more accurate by increasing the video resolution, reducing the camera’s field of view, or fine-tuning.

As for the issue of too many GPUs being required, this can be solved through optimization. For example, using mixed precision/full half precision (FP16/BFP16) in the model. Generally, using mixed precision in the model has a minimal impact on accuracy, so we did not make too many trade-offs.

In summary, if all optimizations are in place, reducing the number of GPUs from 20 to one is actually feasible. If properly optimized, even one GPU’s resources may not be fully utilized.

Original text address:https://towardsdatascience.com/i-built-a-diy-license-plate-reader-with-a-raspberry-pi-and-machine-learning-7e428d3c7401

Links in the text can be clicked to read the original text at the end

Building a License Plate Recognition System with Tesla and Raspberry Pi

More Exciting Content

Making a Colorful LED Ball with ESP32

DIY Magnetic Levitation Globe with Electromagnet

LED Digital Thermometer Based on Raspberry Pi

Fun Endless 2019 Annual Project Review

Transforming the NumWorks Calculator with Raspberry Pi

Making Electric Wheelchair Rearview Mirror with Raspberry Pi Zero

Handcrafting a “Canned Lamp” with LED Filaments

DIY Real-life CS Equipment with Raspberry Pi: Infrared Laser Gun

Related posts

Leave a Comment Cancel reply