DIY License Plate Recognition System with Raspberry Pi

Click the “Machine Vision” above, and select “Pin/Star” in the upper right corner…Public Account to receive the latest articles!

Source: Translated by Machine Heart

Original article: https://towardsdatascience.com/i-built-a-diy-license-plate-reader-with-a-raspberry-pi-and-machine-learning-7e428d3c7401

Author: Robert Lucian Chiriac, translated by Machine Heart

With some free time, we equipped our beloved car with a Raspberry Pi, added a camera, designed a client, and successfully built a real-time license plate detection and recognition system.

How to create an intelligent car system without changing the vehicle? For some time, the author Robert Lucian Chiriac has been pondering how to give a car the ability to detect and recognize objects. This idea is intriguing because we’ve seen the capabilities of Tesla, and while I can’t buy a Tesla right away (I must mention that the Model 3 is looking more and more attractive), I had an idea to strive for this dream.

So, the author achieved this with a Raspberry Pi, which can detect license plates in real-time when placed in the car.

In the following content, we will introduce each step of the project and provide the GitHub project link, where the project address is just the client tool, and other datasets and pre-trained models can be found at the end of the original blog.

Project address:

https://github.com/RobertLucian/cortex-license-plate-reader-client

DIY License Plate Recognition System with Raspberry Pi

Now, let’s see how the author Robert Lucian Chiriac built a practical vehicle detection and recognition system step by step.

Here’s a finished product picture to kick things off.

Step 1:Define the Project Scope

Before starting, the first question that came to my mind was what this system should be capable of. If I’ve learned anything in my life, it’s that taking things step by step is always the best strategy. So, besides basic vision tasks, all I need is to clearly recognize license plates while driving. This recognition process involves two steps:

Detect the license plate.
Recognize the text within each license plate bounding box.

I think if I can accomplish these tasks, it will be much easier to tackle other similar tasks (like determining collision risks, distances, etc.). I might even create a vector space to represent the surrounding environment — sounds cool just thinking about it.

Before finalizing these details, I knew I had to achieve:

A machine learning model that detects license plates using unlabeled images as input;
Some hardware. Simply put, I need a computer system connected to one or more cameras to call my model.

Let’s start with the first task — building the object detection model.

Step 2:Select the Right Model

After careful research, I decided to use the following machine learning models:

YOLOv3 – This is one of the fastest models available today and has a comparable mAP to other SOTA models. We use this model to detect objects;
CRAFT Text Detector – We use it to detect text in images;
CRNN – In simple terms, it’s a recurrent convolutional neural network model. It must be sequential data to arrange detected characters into words in the correct order;

How do these three models work together? Here’s the workflow:

First, the YOLOv3 model receives frames from the camera and finds the bounding boxes of the license plates in each frame. It is not recommended to use very precise predicted bounding boxes — having the bounding box slightly larger than the actual detected object is better. If it’s too tight, it may affect the performance of subsequent processes;
The text detector receives the cropped license plates from YOLOv3. If the bounding box is too small, it’s likely that part of the license plate text is also cropped out, resulting in poor prediction results. However, when the bounding box is enlarged, we can allow the CRAFT model to detect the positions of the letters, making each letter’s position very precise;
Finally, we can pass each word’s bounding box from CRAFT to the CRNN model to predict the actual words.

With my basic model architecture sketch in hand, I can start focusing on hardware.

Step 3:Design Hardware

When I realized I needed low-power hardware, I remembered my old love: the Raspberry Pi. It has a dedicated camera (Pi Camera) and enough computing power to preprocess each frame at a decent frame rate. The Pi Camera is a physical camera for the Raspberry Pi and has a mature and complete library.

To connect to the internet, I can use the EC25-E 4G connection, which I used in a previous project with its GPS module, for details see:

Blog link: https://www.robertlucian.com/2018/08/29/mobile-network-access-rpi/

Next, I need to start designing the casing — it should be fine to hang it on the car’s rearview mirror, so I ultimately designed a two-part support structure:

On the rearview mirror side, the Raspberry Pi + GPS module + 4G module will be retained. You can check my article about the EC25-E module for the GPS and 4G antennas I used;
On the other side, I used a ball-joint arm to support the Pi Camera.

I will print these parts using my reliable Prusa i3 MK3S 3D printer, and the original article also provides 3D printing parameters.

Figure 1: Shape of the Raspberry Pi + 4G/GPS case

Figure 2:Using a ball-joint arm to support the Pi Camera

Figures 1 and 2 show how they look when rendered. Note that the c-shaped bracket is pluggable, so the supports for the Raspberry Pi and Pi Camera are not printed together with the bracket. They share a socket, with the bracket plugged in. If any reader wants to replicate this project, this is very useful. They only need to adjust the support on the rearview mirror. Currently, this base works very well on my car (Land Rover Freelander).

Figure 3:Side view of the Pi Camera support structure

Figure 4:Front view of the Pi Camera support structure and RPi base

Figure 5:Expected camera field of view

Figure 6:Close-up of the embedded system with built-in 4G/GPS module and Pi Camera

Clearly, these items require some time to model, and I need to do it several times to achieve a sturdy structure. The PETG material I used has a layer height of 200 microns. PETG works well at 80-90 degrees Celsius and has strong resistance to UV radiation — although not as good as ASA, it is still quite strong.

This was designed in SolidWorks, so all my SLDPRT/SLDASM files, as well as all the STLs and gcode, can be found at the end of the original article. You can also use these to print your own version.

Step 4:Train the Model

Now that the hardware is settled, it’s time to start training the model. Everyone should know that working on the shoulders of giants is essential. This is the core of transfer learning — first, learn with a very large dataset, then leverage the knowledge gained.

YOLOv3

I found many pre-trained license plate models online, but not as many as I initially expected. However, I found one trained on 3600 license plate images. This training set is not large, but it’s better than nothing. Moreover, it was also trained based on a pre-trained model from Darknet, so I could use it directly.

Model link: https://github.com/ThorPham/License-plate-detection

Since I already had a recordable hardware system, I decided to drive around town for a few hours to collect new video frame data for fine-tuning the previous model.

I used VOTT to label the frames containing license plates, ultimately creating a small dataset of 534 images, all with labeled bounding boxes for the license plates.

Dataset link: https://github.com/RobertLucian/license-plate-dataset

Then I found code for implementing YOLOv3 with Keras and used it to train my dataset, subsequently submitting my model to this repo so others could use it. I ended up with a mAP of 90% on the test set, which is a great result considering my dataset is very small.

Keras implementation: https://github.com/experiencor/keras-yolo3
Submit merge request: https://github.com/experiencor/keras-yolo3/pull/244

CRAFT & CRNN

To find a suitable network for text recognition, I went through countless trials. Eventually, I stumbled upon keras-ocr, which packages CRAFT and CRNN and is very flexible, with pre-trained models available, which is fantastic. I decided not to fine-tune the models and keep them as they are.

keras-ocr link: https://github.com/faustomorales/keras-ocr

The best part is that predicting text with keras-ocr is incredibly simple. It basically takes just a few lines of code. You can check the project homepage to see how this is done.

Step 5:Deploy My License Plate Detection Model

There are mainly two methods for model deployment:

Perform all inference locally;
Perform inference in the cloud.

Both methods have their challenges. The first means having a central “brain” computer system, which is complex and expensive. The second faces challenges of latency and infrastructure, especially when using GPUs for inference.

In my research, I stumbled upon an open-source project called cortex. It’s a newcomer in the AI field, but as the next evolution of AI development tools, it makes perfect sense.

Cortex project link: https://github.com/cortexlabs/cortex

Essentially, cortex is a platform that deploys machine learning models as production web services. This means I can focus on my application while leaving the rest to cortex. It does all the preparation work on AWS, and all I need to do is write a predictor using template models. Even better, I only need to write a few dozen lines of code for each model.

Below is a terminal from the cortex runtime obtained from the GitHub repo. If this isn’t elegant and concise, I don’t know what word to use to describe it:

Since this computer vision system is not designed for autonomous driving, latency is less of a concern for me, and I can use cortex to address this issue. If it were part of an autonomous driving system, using services provided by cloud vendors would not be a good idea, at least not now.

Deploying ML models with cortex is as simple as:

Defining a cortex.yaml file, which is our API configuration file. Each API will handle one type of task. I assigned the task of detecting license plate bounding boxes on the given frame to the yolov3 API, while the crnn API predicts license plate numbers with the help of the CRAFT text detector and CRNN;
Defining predictors for each API. Basically, all you need to do is define a predict method for a specific class in cortex to receive a payload (all the server part has been handled by the platform), this payload can be used to predict results and then return the prediction results. It’s that simple!

Here’s a classic predictor example for the iris dataset, but due to the length of the article, I won’t elaborate on the specifics here. You can find methods for using these two APIs in the project link — all other resources for this project are at the end of this article.

<span># predictor.pyimport boto3</span>import picklelabels = [<span>"setosa"</span>, <span>"versicolor"</span>, <span>"virginica"</span>]<span><span>class</span> <span>PythonPredictor</span>:</span> <span><span>def</span> <span>__init__(<span>self</span>, config)</span></span>:        s3 = boto3.client(<span>"s3"</span>)        s3.download_file(config[<span>"bucket"</span>], config[<span>"key"</span>], <span>"model.pkl"</span>)        <span>self</span>.model = pickle.load(open(<span>"model.pkl"</span>, <span>"rb"</span>))    <span><span>def</span> <span>predict(<span>self</span>, payload)</span></span>:        measurements = [            payload[<span>"sepal_length"</span>],            payload[<span>"sepal_width"</span>],            payload[<span>"petal_length"</span>],            payload[<span>"petal_width"</span>],        ]        label_id = <span>self</span>.model.predict([measurements])[<span>0</span>]        <span>return</span> labels[label_id]

To make predictions, you just need to use curl like this:

curl http://***.amazonaws.com/iris-classifier \    -X POST -H <span>"Content-Type: application/json"</span> \    -d '{<span>"sepal_length"</span>: 5.2, <span>"sepal_width"</span>: 3.6, <span>"petal_length"</span>: 1.4, <span>"petal_width"</span>: 0.3}'

Then you will receive feedback like setosa, very simple!

Step 6:Develop the Client

With cortex helping me deploy, I can start designing the client — this is the more challenging part.

I thought of the following architecture:

Collect frames from the Pi Camera at an acceptable resolution (800×450 or 480×270) with a frame rate of 30 FPS and push each frame into a public queue;
In a separate process, I will take frames from the queue and distribute them to multiple workstations on different threads;
Each worker thread (or what I call inference thread) will make API requests to my cortex API. First, a request goes to my yolov3 API, and then, if any license plates are detected, another request with a batch of cropped license plates is sent to my crnn API. The predicted license plate numbers will be returned in text format;
Each detected license plate (with or without the recognized text) will be pushed to another queue, which will eventually broadcast it to the browser page. Simultaneously, the predicted license plate numbers will be pushed to another queue to be saved to disk in CSV format later;
The broadcast queue will receive a set of unordered frames. The consumer’s task is to first place them in a very small buffer (the size of a few frames) and broadcast a new frame to the client for reordering. This consumer runs in a separate process, and it must also try to keep the queue size fixed at a specified value to display frames at a consistent frame rate. Clearly, if the queue size decreases, the frame rate will decrease proportionally, and vice versa;
Meanwhile, in the main process, another thread will run to get predictions and GPS data from another queue. When the client receives a termination signal, predictions, GPS data, and timestamps will also be stored in a CSV file.

The following diagram shows the relationship flowchart between the client and the cloud API provided by AWS through cortex.

Figure 7: Flowchart of the cloud API provided by cortex and the client

In our case, the client is the Raspberry Pi, and the cloud API to which inference requests are sent is provided by cortex on AWS.

The source code for the client can also be found in its GitHub: https://github.com/robertlucian/cortex-licens-plate-reader-client

One challenge I had to overcome was the bandwidth of 4G. It’s best to reduce the bandwidth required by this application to minimize possible hangups or overuse of available data. I decided to have the Pi Camera use a very low resolution: 480×270 (we can use a small resolution here because the Pi Camera has a very narrow field of view, allowing us to easily recognize license plates).

However, even at this resolution, the JPEG size of each frame is about 100KB (0.8MBits). Multiply by 30 frames per second, and you get 3000KB, which is 24mb/s, still a lot without HTTP overhead.

Therefore, I used some tricks:

Reduce the width to 416 pixels, which is the required size for the YOLOv3 model, and the scale is clearly lossless;
Convert the image to grayscale;
Remove the top 45% of the image. The idea here is that license plates won’t appear at the top of the car frame because cars don’t fly, right? As far as I know, removing 45% of the image doesn’t affect the predictor’s performance;
Convert the image back to JPEG, but at a much lower quality this time.

The final frame size is about 7-10KB, which is excellent. This translates to about 2.8Mb/s. However, considering all the overhead, it’s about 3.5Mb/s. For the crnn API, the cropped license plates don’t require too much space; even without compression, they are about 2-3KB each.

In summary, to operate at 30FPS, the bandwidth required for the inference API is about 6Mb/s, which is an acceptable figure for me.

Results

Success!

The example above demonstrates real-time inference through cortex. I need about 20 GPU-equipped instances to run it smoothly. Depending on the latency of this set of GPUs, you may need more GPUs or fewer instances. The average latency from capturing a frame to broadcasting it to the browser window is about 0.9 seconds, which is amazing considering that inference happens far away — I’m still amazed.

The text recognition part might not be the best, but it at least proves one point — it can be made more precise by increasing the video resolution or by reducing the camera’s field of view or by fine-tuning.

As for the issue of too many GPU demands, this can be solved through optimization. For example, using mixed precision/full half precision (FP16/BFP16) in the model. Generally speaking, allowing the model to use mixed precision has a minimal impact on accuracy, so we didn’t make too many trade-offs.

In summary, if all optimizations are in place, reducing the number of GPUs from 20 to one is actually feasible. With proper optimization, even one GPU’s resources may not be fully utilized.

Recommended Popular Articles:

001: Research resources and journals, conference introductions in the field of computer vision

002:German kuka robot vs world champion table tennis

003:120 images outline the complete map of the global AI industry!

004:Facebook open-source computer vision system, understanding images at the pixel level (includes papers and code)

005:If you want to become a machine learning engineer? This self-learning guide is worth saving

006: Eleven common filtering algorithms

007: Basics of image processing and computer vision, classic and recent developments

008: In-depth report on the robotics industry (full version)

009: From laundry girl to Google chief scientist, she changed the AI界 with loneliness!

010:Industrial-grade machine vision industry research report

011:Brief description of the principle of dual telecentric industrial lenses

012:How to equip an academic iPad?

013:Overview of machine vision systems

014:German craftsmen: We don’t have “good quality at a low price”

015:Why is the best robotic arm 7 degrees of freedom, not 6?

016:The most powerful technology video ever!

017:Comparison of the top 10 popular programming languages for robots, which one do you master?

018:Novel and complex mechanical principle diagrams!

019: A comprehensive collection of knowledge related to robot control systems

020: The working principles of robots, the most detailed analysis ever!

021: Knowledge points for selecting light sources

022: This is what a mechanical hand is, this is automation, what you have is nothing!

023: Basic knowledge of cameras and lenses

024: The panoramic view of the IoT industry chain (includes another 13 panoramic views of the electronic industry, must collect)

025: How powerful is Japan? Breathtaking! I couldn’t sleep all night after watching it

026: German machinery amazes the world: how lonely is invincibility

All conditioned phenomena are like dreams, illusions, bubbles, and shadows; like dew and lightning, should be observed like this!

Welcome to forward, leave a message, like, and share, thank you for your support!

Related posts

Leave a Comment Cancel reply