When it comes to Google’s Mediapipe project, it is a cross-platform, customizable real-time video stream machine learning solution.
There are many solutions that can be implemented, and you can choose according to your needs.
These include face detection, facial mesh, iris detection, gesture detection, pose detection, full-body pose detection (including facial information), hair segmentation, object detection, object box tracking, real-time motion tracking, KNIFT detection (traffic signal detection), 3D object bounding box detection, and more. It supports implementations on different platforms and in different languages, as shown in the figure below:
Here we see that the solutions available for Python include: Face detection, Face mesh, Hands gesture recognition, Pose detection, Holistic full-body detection, Selfie Segmentation, Objectron real-time 3D object detection solutions.
Here, we will take Hands gesture recognition as an interesting application.
About Gesture
The ability to perceive the shape and movement of the hand may be an important component in enhancing user experience across various technological fields and platforms. For example, specific operations can be performed based on hand movements, the number of fingers extended, and the dynamics of finger changes. On the Raspberry Pi, GPIO can be controlled to respond to these specific operations, achieving gesture interaction, which is a very cool application.
MediaPipe Hands is a high-resolution hand and finger tracking solution. It uses machine learning (ML) to infer 21 3D landmarks of a hand from a single frame.
MediaPipe Hands utilizes a pipeline of ML models that work together: a palm detection model that operates on the complete image and returns a directed hand bounding box. The hand landmark model operates on the cropped image area defined by the palm detector and returns high-resolution 3D hand keypoints, allowing us to draw the information of hand keypoints using OpenCV’s circle method, thus determining the position and posture of these keypoints (landmarks) in the image. Based on this information, some operations can be bound. For example, when I extend two fingers in front of the camera, every frame of the video stream is detected by Mediapipe’s Hands solution, which detects the information of 21 landmark points of the hand. Then, by observing the changes between these landmark points, we can control GPIO operations. Let’s take a look at the illustration of the hand landmark information model below:
The 21 points can represent each key point of the hand. For example, if I want to determine the X and Y coordinate information of the fingertip of the index finger on the screen, I need to obtain the position information when landmark 8 appears on the screen. Then, using OpenCV’s circle method, I can draw a circle to determine the location of the finger on the screen, which can trigger some mystical events. For instance, when the finger slides to the coordinate range of (10, 20) to (20, 40), we can trigger an operation to overlay an image or text on the screen. This can create a fun application; everyone can brainstorm applications that can be used in a haunted house escape. Haha!
Doesn’t the rendered hand image seem clear and straightforward?
How to Install on Raspberry Pi?
1. Download the system and connect the camera
It is recommended to use the etcher tool and the official image file to complete the burning.
Burning software: https://etcher.io/
Official image: https://www.raspberrypi.com/software/operating-systems/#raspberry-pi-os-legacy
It is recommended to use 32bit, as the 64bit version has poor mmal support, causing Raspberry Pi libcamera-lib to be difficult to use.
Installing the camera is very simple; refer to the image below for installation. Pull up the card lock on both sides, insert the FPC ribbon cable, then press down the card lock, ensuring it is level and in the correct direction.
2. Install the virtual environment
Execute the command:
sudo apt update
sudo apt –y install vim virtualenv
3. Configure the virtual environment and install the OpenCV library
Execute:
virtualenv –p python3 venv
Enter the virtual environment and activate:
cd venv
source bin/activate
python3 -m pip install opencv-python
pip3 install opencv-contrib-python
Wait patiently for the installation to complete. If it fails, execute a few more times until successful before continuing.
Seeing “Successfully installed” is good.
4. Install the mediapipe library
Execute:
pip3 install mediapipe-rpi4
Wait for the installation to complete, and the basic environment will be set up.
Similarly, wait to see “Successfully installed” is good.
5. Enable the Raspberry Pi camera
Execute the command:
sudo raspi-config
Then select 3 Interface options,
Then select Legacy Camera Enable…
Then check if the camera can be detected:
vcgencmd get_camera
6. Write code for testing
Next, you can try writing code for testing. You can refer to the official Python code examples for learning:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
# For static images:
IMAGE_FILES = []
with mp_hands.Hands(
static_image_mode=True,
max_num_hands=2,
min_detection_confidence=0.5) as hands:
for idx, file in enumerate(IMAGE_FILES):
# Read an image, flip it around y-axis for correct handedness output (see
# above).
image = cv2.flip(cv2.imread(file), 1)
# Convert the BGR image to RGB before processing.
results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
# Print handedness and draw hand landmarks on the image.
print(‘Handedness:’, results.multi_handedness)
if not results.multi_hand_landmarks:
continue
image_height, image_width, _ = image.shape
annotated_image = image.copy()
for hand_landmarks in results.multi_hand_landmarks:
print(‘hand_landmarks:’, hand_landmarks)
print(
f’Index finger tip coordinates: (‘,
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, ‘
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})’
)
mp_drawing.draw_landmarks(
annotated_image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
cv2.imwrite(
‘/tmp/annotated_image’ + str(idx) + ‘.png’, cv2.flip(annotated_image, 1))
# Draw hand world landmarks.
if not results.multi_hand_world_landmarks:
continue
for hand_world_landmarks in results.multi_hand_world_landmarks:
mp_drawing.plot_landmarks(
hand_world_landmarks, mp_hands.HAND_CONNECTIONS, azimuth=5)
Let’s do a simple analysis:
import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_hands = mp.solutions.hands
These paragraphs import the OpenCV library, the mediapipe library, and also import the drawing tools drawing_utils and drawing styles, as well as the hands solution class, instantiating it as the mp_hands object.
The initialization part is crucial, as it instantiates the Hands object and fine-tunes the supported configuration options. For example, static_image_mode enables static image mode, max_num_hands can detect a maximum of 2 hands, and the minimum confidence value is set to 0.5, meaning that as long as it reaches 0.5, detection will be considered successful. Although there may be misjudgments, the speed will be slightly improved. If you want to achieve higher accuracy, you can adjust the values here.
with mp_hands.Hands(
static_image_mode=True,
max_num_hands=2,
min_detection_confidence=0.5) as hands:
for idx, file in enumerate(IMAGE_FILES):
# Read an image, flip it around y-axis for correct handedness output (see
# above).
image = cv2.flip(cv2.imread(file), 1)
# Convert the BGR image to RGB before processing.
results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
This part means that the entire image is flipped along the Y-axis, so that in the screen view, your left and right hands appear mirrored, looking as if you are looking into a mirror. Otherwise, if you extend your left hand, it will show on the right side.
Another very crucial step is that the hands.process() function requires the entire parameter to be an RGB format image, but by default, the images captured using OpenCV are in BGR format and need to be converted; otherwise, detection will not be possible.
The last part of the content analysis:
Define three variables for image height and width, and channel, but the channel is not used, so it is replaced with “_”. The variable content is obtained through the shape property of the image object, and a copy is made using the copy() method as the object to operate on. A very crucial step here is to iterate through the values of multi_hand_landmarks in the results object and obtain the position of each key point on the screen through the information of the 21 landmarks. By obtaining the X and Y coordinates of each landmark, multiplying by the width and height of the image, the actual position in the image can be calculated.
image_height, image_width, _ = image.shape
annotated_image = image.copy()
for hand_landmarks in results.multi_hand_landmarks:
print(‘hand_landmarks:’, hand_landmarks)
print(
f’Index finger tip coordinates: (‘,
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, ‘
f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})’
)
This part starts calling the draw_landmarks() method to draw the diagram of the hand’s 21 points on the image and connects these points with line segments to form a schematic of the hand skeleton.
mp_drawing.draw_landmarks(
annotated_image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
The following part is to write the results into a graphic file.
cv2.imwrite(
‘/tmp/annotated_image’ + str(idx) + ‘.png’, cv2.flip(annotated_image, 1))
# Draw hand world landmarks.
if not results.multi_hand_world_landmarks:
continue
for hand_world_landmarks in results.multi_hand_world_landmarks:
mp_drawing.plot_landmarks(
hand_world_landmarks, mp_hands.HAND_CONNECTIONS, azimuth=5)
Temporary file path and file index, passing in the flipped image (rotated along the Y-axis).
Finally, we check if a hand is detected, displaying the node information of the hand and showing the connection information; otherwise, continue detecting.
Next, we will operate on the video stream.
# For webcam input:
Initialize an instance cap, set the camera index to 0 as a capture object to obtain each frame of the video.
cap = cv2.VideoCapture(0)
Also instantiate a Hands class, generating an object called hands.
with mp_hands.Hands(
model_complexity=0,
min_detection_confidence=0.5,
min_tracking_confidence=0.5) as hands:
Continuously check if the camera is on; if it is, use cap.read() to read a frame of the image.
while cap.isOpened():
success, image = cap.read()
if not success:
print(“Ignoring empty camera frame.”)
# If loading a video, use ‘break’ instead of ‘continue’.
continue
# To improve performance, optionally mark the image as not writeable to
# pass by reference.
Make the image read-only
image.flags.writeable = False
Convert the image format from BGR to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
Pass the image to the process method for processing.
results = hands.process(image)
# Draw the hand annotations on the image.
Make the image writable again
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
if results.multi_hand_landmarks:
for hand_landmarks in results.multi_hand_landmarks:
mp_drawing.draw_landmarks(
image,
hand_landmarks,
mp_hands.HAND_CONNECTIONS,
mp_drawing_styles.get_default_hand_landmarks_style(),
mp_drawing_styles.get_default_hand_connections_style())
# Flip the image horizontally for a selfie-view display.
Flip the image and display it.
cv2.imshow(‘MediaPipe Hands’, cv2.flip(image, 1))
Detect if the ESC key is pressed every 5 milliseconds; if pressed, exit the loop and release the camera, and it’s best to close the window.
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
cv2.destroyAllWindows()
Actual testing effect:
Then detect all landmarks of the finger and draw circular points:
We only need the index finger, so we only draw when id equals 8. This allows us to implement the x and y coordinate parameters of the index finger’s id 8, multiply them by the width and height of the image, and obtain the actual position in the image, then draw a circular point.
The code content is as follows:
import cv2
import time
import mediapipe as mp
cap = cv2.VideoCapture(0)
mp_draw = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
pTime = 0
detector = mp_hands.Hands(min_detection_confidence=0.5)
while True:
ret, frame = cap.read()
cTime = time.time()
fps = 1 / (cTime – pTime)
pTime = cTime
frame_RGB = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
result = detector.process(frame_RGB)
# print(result.multi_hand_landmarks)
if result.multi_hand_landmarks is not None:
print(result.multi_hand_landmarks[0].landmark)
for id, landmark in enumerate(result.multi_hand_landmarks[0].landmark):
print(id, landmark)
height, width, channel = frame.shape
if id == 8:
fx = int(landmark.x * width)
fy = int(landmark.y * height)
cv2.circle(frame, (fx, fy), 5, (0, 0, 255), cv2.FILLED)
cv2.putText(frame, “FPS:{}’.format(int(fps)), (10, 70), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 0), 3)
cv2.imshow(“image”, frame)
if cv2.waitKey(1) & 0xFF == 27: # press ESC key
break
cap.release()
cv2.destroyAllWindows()
Finally, we just need to use the rpi.gpio library to achieve the function of lighting an LED when the finger touches it. We will connect an LED with a 1k ohm resistor to pin 12 and connect the negative terminal to the GND pin.
If not installed, please execute in the terminal:
pip3 install RPi.GPIO
Add to the code:
import RPi.GPIO as GPIO
import cv2
import time
import mediapipe as mp
GPIO.setmode(GPIO.BOARD)
GPIO.setup(12, GPIO.OUT)
cap = cv2.VideoCapture(0)
mp_draw = mp.solutions.drawing_utils
mp_hands = mp.solutions.hands
pTime = 0
detector = mp_hands.Hands(min_detection_confidence=0.5)
while True:
ret, frame = cap.read()
cTime = time.time()
fps = 1 / (cTime – pTime)
pTime = cTime
frame_RGB = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
result = detector.process(frame_RGB)
# print(result.multi_hand_landmarks)
if result.multi_hand_landmarks is not None:
print(result.multi_hand_landmarks[0].landmark)
for id, landmark in enumerate(result.multi_hand_landmarks[0].landmark):
print(id, landmark)
height, width, channel = frame.shape
if id == 8:
fx = int(landmark.x * width)
fy = int(landmark.y * height)
cv2.circle(frame, (fx, fy), 5, (0, 0, 255), cv2.FILLED)
GPIO.output(12, GPIO.HIGH)
else:
GPIO.output(12, GPIO.LOW)
cv2.putText(frame, “FPS:{}’.format(int(fps)), (10, 70), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 0), 3)
cv2.imshow(“image”, frame)
if cv2.waitKey(1) & 0xFF == 27: # press ESC key
break
cap.release()
cv2.destroyAllWindows()
Then save and execute this program; my program name is test_camera.py.
python test_camera.py
Then wave your finger in front of the camera, and as long as the tip of your index finger is detected, the LED will light up. We will use the simplest method to help you achieve the function of controlling an LED with your finger. Please brainstorm and try other operations. For example, by detecting the distance between two fingertips, you can achieve image scaling or control the audio volume.
By detecting the number of fingers, you can perform voice broadcasts, or play animations based on the finger position. The rest is up to you to create!
Highly Recommended:
“Raspberry Pi 4 and Practical AI Projects”
Deep dive into the world of Raspberry Pi and AI
Tsinghua University Press
ISBN: 9787302603252
Publication Date: 2022-07-01
Price: 79 RMB
Content Summary
This book mainly introduces the characteristics of different types of Raspberry Pi and the basic knowledge required for getting started; it covers various operation methods of Raspberry Pi GPIO, as well as hardware-related content such as Raspberry Pi I2C bus, SPI bus, UART serial port, PWM pulse width modulation, etc.; at the same time, it also prepares some common service types for readers to build and configure on Raspberry Pi, including methods for building Raspberry Pi streaming servers, installation and configuration of common databases MariaDB, PostgreSQL, MQTT server setup and configuration, DHCP server setup and configuration, etc.
In addition, it includes some interesting experiments, such as using TensorFlow for object detection, creating a Raspberry Pi scanner with OpenCV, or conducting a nose-swapping experiment with OpenCV, leading readers to understand some applications that can be implemented with Raspberry Pi using cameras.
This book provides a good entry point for beginners to comprehensively understand Raspberry Pi, allowing readers to learn more about the usage and operational tips of Raspberry Pi. At the same time, common programming languages such as C, Python, and Shell script are used throughout the programming process, making it more user-friendly for users with experience in these languages. I hope readers can find experiments they like here and smoothly get started with Raspberry Pi!
Author Introduction