(For Python developers, add a star to improve Python skills)
Source: Machine Heart
With some free time, we equipped our car with a Raspberry Pi, added a camera, designed a client, and completed a real-time license plate detection and recognition system.
How to build an intelligent car system without changing the car? For some time now, the author Robert Lucian Chiriac has been thinking about giving cars the ability to detect and recognize objects. This idea is very interesting because we have seen the capabilities of Tesla, and although we can’t buy a Tesla right away (I must mention that the Model 3 is becoming more and more attractive), he has an idea to work hard to achieve this dream.
So, the author achieved this with Raspberry Pi, which can detect license plates in real-time when placed in the car.
In the following content, we will introduce each step of the project and provide the GitHub project address, where the project address is just the client tool, and other datasets and pre-trained models can be found at the end of the original blog.Project Address:https://github.com/RobertLucian/cortex-license-plate-reader-clientNext, let’s see how the author Robert Lucian Chiriac built a useful vehicle detection and recognition system step by step.
Here is a finished product image.Step 1:Define the project scopeBefore starting, the first question that came to my mind was what this system should be able to do. If there’s one thing I’ve learned in my life, it’s that taking things step by step is always the best strategy. So, besides the basic visual tasks, what I needed was simply to clearly recognize license plates while driving. This recognition process includes two steps:
Detect the license plate.
Recognize the text within each license plate bounding box.
I think if I can complete these tasks, it will be much easier to do other similar tasks (like determining collision risks, distances, etc.). I might even be able to create a vector space to represent the surrounding environment—sounds cool just thinking about it.Before determining these details, I knew I had to achieve:
A machine learning model to detect license plates from unlabeled images;
Some kind of hardware. Simply put, I need a computer system connected to one or more cameras to run my model.
So let’s start with the first thing—building an object detection model.Step 2:Select the right modelAfter careful research, I decided to use the following machine learning models:
YOLOv3 – This is one of the fastest models available and has a comparable mAP to other SOTA models. We use this model to detect objects;
CRAFT text detector – We use it to detect text in images;
CRNN – Simply put, this is a recurrent convolutional neural network model. To arrange the detected characters into words in the correct order, it must handle sequential data;
How do these three models work together? The following describes the operational flow:
First, the YOLOv3 model receives frames from the camera and finds the bounding box for the license plate in each frame. It is not advisable to use very precise predicted bounding boxes—having the bounding box slightly larger than the actual detected object is better. If it’s too tight, it may affect the performance of subsequent processes;
The text detector receives the cropped license plate from YOLOv3. At this point, if the bounding box is too small, it is likely that part of the license plate text will be cropped out, resulting in poor prediction results. However, when the bounding box is enlarged, we can allow the CRAFT model to detect the positions of the letters, making each letter’s position very precise;
Finally, we can pass the bounding box of each word from CRAFT to the CRNN model to predict the actual words.
With my basic model architecture sketch, I can start moving on to hardware.Step 3:Design HardwareWhen I realized I needed low-power hardware, I thought of my old love: Raspberry Pi. Because it has a dedicated camera Pi Camera and sufficient computing power to preprocess each frame at a decent frame rate. The Pi Camera is a physical camera for Raspberry Pi, and it has a mature and complete library.To connect to the internet, I can use EC25-E’s 4G access; I have also used its GPS module in a previous project, details can be found here:Blog Address: https://www.robertlucian.com/2018/08/29/mobile-network-access-rpi/Then I will start designing the case—hanging it on the car’s rearview mirror should be fine, so I ultimately designed a two-part support structure:
On the rearview mirror side, the Raspberry Pi + GPS module + 4G module will be kept. You can check my article about the GPS and 4G antennas I used;
On the other side, I used an arm that utilizes a ball joint to support the Pi Camera.
I will use my reliable Prusa i3 MK3S 3D printer to print these parts, and the 3D printing parameters are also provided at the end of the original text.
Figure 1: Shape of the Raspberry Pi + 4G/GPS case
Figure 2:Using a ball joint arm to support the Pi CameraFigures 1 and 2 show what they look like when rendered. Note that the C-shaped bracket is detachable, so the Raspberry Pi’s accessories and the support for the Pi Camera are not printed together with the bracket. They share a socket, with the bracket plugged in. If any reader wants to replicate this project, this is very useful. They only need to adjust the bracket on the rearview mirror. Currently, this base works great on my car (Land Rover Freelander).
Figure 3:Side view of the Pi Camera support structure
Figure 4:Front view of the Pi Camera support structure and RPi base
Figure 5:Expected camera field of view
Figure 6:Close-up of the embedded system with built-in 4G/GPS module and Pi CameraClearly, these things take some time to model, and I need to do it several times to get a sturdy structure. The PETG material I used has a layer height of 200 microns. PETG works well at 80-90 degrees Celsius and has strong resistance to ultraviolet radiation—though not as good as ASA, it is still strong.This was designed in SolidWorks, so all my SLDPRT/SLDASM files, as well as all STLs and gcode, can be found at the end of the original text. You can also use these to print your own version.Step 4:Train the ModelNow that the hardware is solved, it’s time to start training the model. Everyone should know that it’s best to work by standing on the shoulders of giants. This is the core of transfer learning—first learn with a very large dataset, and then leverage the knowledge gained.YOLOv3I found many pre-trained license plate models online, but not as many as I initially expected. However, I found one trained on 3600 license plate images. This training set is not large, but it’s better than nothing. Moreover, it is trained based on a pre-trained model from Darknet, so I can use it directly.Model Address: https://github.com/ThorPham/License-plate-detectionSince I already had a recordable hardware system, I decided to drive around town for a few hours to collect new video frame data to fine-tune the previous model.I used VOTT to annotate the frames containing license plates, ultimately creating a small dataset with 534 images, all of which had labeled bounding boxes for the license plates.Dataset Address: https://github.com/RobertLucian/license-plate-datasetThen I found code implementing YOLOv3 with Keras and used it to train my dataset, then submitted my model to this repo so that others could use it. I ultimately achieved an mAP of 90% on the test set, which is great considering my dataset is very small.
Keras Implementation: https://github.com/experiencor/keras-yolo3
CRAFT & CRNNTo find a suitable network for text recognition, I went through countless trials. Eventually, I stumbled upon keras-ocr, which packages CRAFT and CRNN, is very flexible, and has pre-trained models—this is fantastic. I decided not to fine-tune the models and keep them as is.keras-ocr Address: https://github.com/faustomorales/keras-ocrMost importantly, predicting text with keras-ocr is very simple. Basically, it takes just a few lines of code. You can check out the project homepage to see how it’s done.Step 5:Deploy My License Plate Detection ModelThere are mainly two ways to deploy the model:
Run all inference locally;
Run inference in the cloud.
Both methods have their challenges. The first one means having a central “brain” computer system, which is complex and expensive. The second one faces challenges related to latency and infrastructure, especially when using GPUs for inference.In my research, I accidentally discovered an open-source project called Cortex. It is a newcomer in the AI field, but as the next evolution of AI development tools, it undoubtedly makes sense.Cortex Project Address: https://github.com/cortexlabs/cortexBasically, Cortex is a platform for deploying machine learning models as production web services. This means I can focus on my application while leaving the rest to Cortex to handle. It does all the preparation work on AWS, and all I need to do is write a predictor using the template model. Even better, I only need to write a few dozen lines of code for each model.Below is the terminal output of the Cortex runtime obtained from the GitHub repo. If this isn’t elegant and concise, I don’t know what word to use to describe it:
Since this computer vision system is not designed for autonomous driving, latency is less important to me, and I can use Cortex to solve this problem. If it were part of an autonomous driving system, using services provided by cloud providers would not be a good idea, at least not right now.Deploying ML models with Cortex is as simple as:
Define the cortex.yaml file, which is the configuration file for our API. Each API will handle one type of task. The task I assigned to the yolov3 API is to detect the bounding box of the license plate on the given frame, while the crnn API predicts the license plate number with the help of CRAFT text detector and CRNN;
Define the predictor for each API. Basically, what you need to do is define a predict method for a specific class in Cortex that receives a payload (all the servy parts are already handled by the platform), which can be used to predict results and then return the prediction results. It’s that simple!
Here is an example of a predictor for the classic iris dataset, but due to the length of the article, I won’t go into detail here. You can find how to use these two APIs in the project link—the rest of the resources for this project are at the end of this article.
Then you will receive feedback like setosa, very simple!Step 6:Develop the ClientWith Cortex helping me with deployment, I can start designing the client—this is the trickiest part.I thought of the following architecture:
Collect frames from the Pi Camera at an acceptable resolution (800×450 or 480×270) at a frame rate of 30 FPS and push each frame into a public queue;
In a separate process, I will take frames from the queue and distribute them to multiple workstations on different threads;
Each worker thread (or what I call inference thread) will make API requests to my Cortex API. First, a request goes to my yolov3 API, and then if any license plates are detected, another request will be sent with a batch of cropped license plates to my crnn API. The predicted license plate numbers will be returned in text format;
Each detected license plate (with or without recognized text) will be pushed to another queue, which will ultimately broadcast to the browser page. Meanwhile, the predicted license plate numbers will be pushed to another queue to be saved to disk in CSV format later;
The broadcast queue will receive a set of unordered frames. The consumer’s task is to put them in a very small buffer (the size of a few frames) and broadcast a new frame to the client for reordering each time. This consumer runs separately in another process, and it must also try to keep the queue size fixed to a specified value to display frames at a consistent frame rate. Obviously, if the queue size decreases, then the frame rate decrease is proportional, and vice versa;
Meanwhile, another thread will run in the main process, fetching predictions and GPS data from another queue. When the client receives a termination signal, predictions, GPS data, and timestamps will be stored in a CSV file.
The following diagram shows the flowchart of the client and the cloud API provided by Cortex.
Figure 7:Flowchart of the cloud API provided by Cortex and the clientIn our case, the client is the Raspberry Pi, and the inference requests sent to the cloud API are provided by Cortex on AWS.The source code for the client can also be found in its GitHub: https://github.com/robertlucian/cortex-licens-plate-reader-clientOne challenge I had to overcome was the bandwidth of 4G. It is best to reduce the bandwidth required by this application to minimize possible hang-ups or overuse of available data. I decided to let the Pi Camera use a very low resolution: 480×270 (we can use a small resolution here because the Pi Camera’s field of view is very narrow, so we can still easily recognize license plates).However, even at this resolution, the JPEG size of each frame is about 100KB (0.8MBits). Multiply that by 30 frames per second, and you get 3000KB, which is 24mb/s, and this is still without HTTP overhead, which is a lot.Therefore, I used some tricks:
Reduce the width to 416 pixels, which is the size required by the YOLOv3 model, and the scale is obviously intact;
Convert the image to grayscale;
Remove the top 45% of the image. The idea here is that license plates will not appear at the top of the frame since cars don’t fly, right? As far as I know, removing 45% of the image does not affect the performance of the predictor;
Convert the image back to JPEG, but this time the quality is much lower.
The final frame size is about 7-10KB, which is excellent. This corresponds to 2.8Mb/s. But considering all overheads like response time, it is about 3.5Mb/s. For the crnn API, cropped license plates do not require much space, and even without compression, they are only about 2-3KB each.In summary, to run at 30FPS, the bandwidth required for the inference API is about 6Mb/s, which is an acceptable number for me.ResultsSuccess!The example above is a real-time inference through Cortex. I need about 20 GPU-equipped instances to run it smoothly. Depending on the latency of this group of GPUs, you may need more GPUs or fewer instances. The average latency from capturing frames to broadcasting frames to the browser window is about 0.9 seconds, which is amazing considering the inference occurs far away—I am still amazed.The text recognition part may not be the best, but it at least proves one point—it can be made more accurate by increasing the video’s resolution or reducing the camera’s field of view or fine-tuning.As for the issue of excessive GPU demand, this can be solved through optimization. For example, using mixed precision/full half precision (FP16/BFP16) in the model. Generally speaking, using mixed precision in the model has little impact on accuracy, so we didn’t make too many trade-offs.In summary, if all optimizations are in place, it is actually feasible to reduce the number of GPUs from 20 to one. If properly optimized, even one GPU’s resources may not be fully utilized.Original Address:https://towardsdatascience.com/i-built-a-diy-license-plate-reader-with-a-raspberry-pi-and-machine-learning-7e428d3c7401Recommended ReadingClick the title to jump
Real-time Visualization Debug:New open-source tool from VS Code, one-click parsing of code structure
10 Python String Processing Tips
If you find this article helpful, please share it with more people.
Follow “Python Developers” and add a star to improve Python skills.