-
Detect the license plate.
-
Recognize the text within each license plate bounding box.
-
A machine learning model that detects license plates using unlabeled images as input;
-
Some hardware. Simply put, I need a computer system connected to one or more cameras to call my model.
-
YOLOv3 – This is one of the fastest models currently available and has a comparable mAP to other SOTA models. We use this model to detect objects;
-
CRAFT text detector – We use it to detect text in images;
-
CRNN – Simply put, this is a recurrent convolutional neural network model. It must process sequential data to arrange detected characters in the correct order;
-
First, the YOLOv3 model receives frames of images from the camera and finds the bounding boxes for the license plates in each frame. It is not recommended to use very precise predicted bounding boxes – bounding boxes that are slightly larger than the detected objects are better. If they are too tight, it may affect the performance of subsequent processes;
-
The text detector receives the cropped license plates from YOLOv3. At this point, if the bounding boxes are too small, it is likely that part of the license plate text will be cropped out, resulting in poor prediction results. However, when the bounding boxes are enlarged, we can let the CRAFT model detect the positions of the letters, allowing for very precise positioning of each letter;
-
Finally, we can pass the bounding boxes of each word from CRAFT to the CRNN model to predict the actual words.
-
On the side facing the rearview mirror, the Raspberry Pi + GPS module + 4G module will be kept. You can check my article about the EC25-E module for information about the GPS and 4G antennas I used;
-
On the other side, I used an arm that utilizes a ball joint to support the Pi Camera.
-
Keras Implementation: https://github.com/experiencor/keras-yolo3
-
Submit Merge Request: https://github.com/experiencor/keras-yolo3/pull/244
-
Perform all inference locally;
-
Perform inference in the cloud.
-
Defining a cortex.yaml file, which is the configuration file for our API. Each API will handle one type of task. I assigned the task for the yolov3 API to detect the bounding boxes for license plates on the given frame, while the crnn API predicts the license plate number with the help of the CRAFT text detector and CRNN;
-
Defining the predictor for each API. Basically, all you need to do is define a predict method for a specific class in cortex to receive a payload (all the server parts are already handled by the platform), which can be used to predict results and then return the predictions. It’s that simple!
# predictor.pyimport boto3
import pickle
labels = ["setosa", "versicolor", "virginica"]
class PythonPredictor:
def __init__(self, config):
s3 = boto3.client("s3")
s3.download_file(config["bucket"], config["key"], "model.pkl")
self.model = pickle.load(open("model.pkl", "rb"))
def predict(self, payload):
measurements = [
payload["sepal_length"],
payload["sepal_width"],
payload["petal_length"],
payload["petal_width"],
]
label_id = self.model.predict([measurements])[0]
return labels[label_id]
curl http://***.amazonaws.com/iris-classifier \
-X POST -H "Content-Type: application/json" \
-d '{"sepal_length": 5.2, "sepal_width": 3.6, "petal_length": 1.4, "petal_width": 0.3}'
-
Collect frames from the Pi Camera at an acceptable resolution (800×450 or 480×270) at 30 FPS and push each frame into a public queue;
-
In a separate process, I will take frames from the queue and distribute them to multiple workstations on different threads;
-
Each worker thread (or what I call inference thread) will make API requests to my cortex API. First, a request to my yolov3 API, then if any license plates are detected, another request will be sent to my crnn API with a batch of cropped license plates. The predicted license plate numbers will be returned in text format;
-
Pushing each detected license plate (with or without recognized text) to another queue, which will ultimately broadcast it to the browser page. At the same time, the predicted license plate numbers will be pushed to another queue to be saved to disk in CSV format later;
-
The broadcast queue will receive a set of unordered frames. The task of the consumer is to first place them in a very small buffer (the size of a few frames), and each time broadcast a new frame to the client for reordering. This consumer runs separately in another process and must also try to keep the queue size fixed to a specified value to display frames at a consistent frame rate. Obviously, if the queue size decreases, the frame rate will decrease proportionally, and vice versa;
-
Meanwhile, another thread will run in the main process, fetching predictions and GPS data from another queue. When the client receives a termination signal, predictions, GPS data, and timestamps will also be stored in a CSV file.
-
Reduce the width to 416 pixels, which is the size required by the YOLOv3 model, and the scale is obviously intact;
-
Convert the image to grayscale;
-
Remove the top 45% of the image. The idea here is that the license plate will not appear at the top of the frame because cars do not fly, right? As far as I know, removing 45% of the image does not affect the predictor’s performance;
-
Convert the image to JPEG again, but this time the quality is much lower.
Recommended Reading
▼
Leave a Comment
Your email address will not be published. Required fields are marked *