Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

When it comes to Google’s Mediapipe project, it is a cross-platform, customizable real-time video stream machine learning solution.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

There are many solutions that can be implemented, and you can choose according to your needs.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

These include face detection, facial mesh, iris detection, gesture detection, pose detection, full-body pose detection (including facial information), hair segmentation, object detection, object box tracking, real-time motion tracking, KNIFT detection (traffic signal detection), 3D object bounding box detection, and more. It supports implementations on different platforms and in different languages, as shown in the figure below:

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Here we see that the solutions available for Python include: Face detection, Face mesh, Hands gesture recognition, Pose detection, Holistic full-body detection, Selfie Segmentation, Objectron real-time 3D object detection solutions.

Here, we will take Hands gesture recognition as an interesting application.

About Gesture

The ability to perceive the shape and movement of the hand may be an important component in enhancing user experience across various technological fields and platforms. For example, specific operations can be performed based on hand movements, the number of fingers extended, and the dynamics of finger changes. On the Raspberry Pi, GPIO can be controlled to respond to these specific operations, achieving gesture interaction, which is a very cool application.

MediaPipe Hands is a high-resolution hand and finger tracking solution. It uses machine learning (ML) to infer 21 3D landmarks of a hand from a single frame.

MediaPipe Hands utilizes a pipeline of ML models that work together: a palm detection model that operates on the complete image and returns a directed hand bounding box. The hand landmark model operates on the cropped image area defined by the palm detector and returns high-resolution 3D hand keypoints, allowing us to draw the information of hand keypoints using OpenCV’s circle method, thus determining the position and posture of these keypoints (landmarks) in the image. Based on this information, some operations can be bound. For example, when I extend two fingers in front of the camera, every frame of the video stream is detected by Mediapipe’s Hands solution, which detects the information of 21 landmark points of the hand. Then, by observing the changes between these landmark points, we can control GPIO operations. Let’s take a look at the illustration of the hand landmark information model below:

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

The 21 points can represent each key point of the hand. For example, if I want to determine the X and Y coordinate information of the fingertip of the index finger on the screen, I need to obtain the position information when landmark 8 appears on the screen. Then, using OpenCV’s circle method, I can draw a circle to determine the location of the finger on the screen, which can trigger some mystical events. For instance, when the finger slides to the coordinate range of (10, 20) to (20, 40), we can trigger an operation to overlay an image or text on the screen. This can create a fun application; everyone can brainstorm applications that can be used in a haunted house escape. Haha!

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Doesn’t the rendered hand image seem clear and straightforward?

How to Install on Raspberry Pi?

1. Download the system and connect the camera

It is recommended to use the etcher tool and the official image file to complete the burning.

Burning software: https://etcher.io/

Official image: https://www.raspberrypi.com/software/operating-systems/#raspberry-pi-os-legacy

It is recommended to use 32bit, as the 64bit version has poor mmal support, causing Raspberry Pi libcamera-lib to be difficult to use.

Installing the camera is very simple; refer to the image below for installation. Pull up the card lock on both sides, insert the FPC ribbon cable, then press down the card lock, ensuring it is level and in the correct direction.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

2. Install the virtual environment

Execute the command:

sudo apt update

sudo apt –y install vim virtualenv

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

3. Configure the virtual environment and install the OpenCV library

Execute:

virtualenv –p python3 venv

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Enter the virtual environment and activate:

cd venv

source bin/activate

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

python3 -m pip install opencv-python

pip3 install opencv-contrib-python

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Wait patiently for the installation to complete. If it fails, execute a few more times until successful before continuing.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Seeing “Successfully installed” is good.

4. Install the mediapipe library

Execute:

pip3 install mediapipe-rpi4

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Wait for the installation to complete, and the basic environment will be set up.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Similarly, wait to see “Successfully installed” is good.

5. Enable the Raspberry Pi camera

Execute the command:

sudo raspi-config

Then select 3 Interface options,

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Then select Legacy Camera Enable…

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Then check if the camera can be detected:

vcgencmd get_camera

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

6. Write code for testing

Next, you can try writing code for testing. You can refer to the official Python code examples for learning:

import cv2

import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils

mp_drawing_styles = mp.solutions.drawing_styles

mp_hands = mp.solutions.hands

# For static images:

IMAGE_FILES = []

with mp_hands.Hands(

static_image_mode=True,

max_num_hands=2,

min_detection_confidence=0.5) as hands:

for idx, file in enumerate(IMAGE_FILES):

# Read an image, flip it around y-axis for correct handedness output (see

# above).

image = cv2.flip(cv2.imread(file), 1)

# Convert the BGR image to RGB before processing.

results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

# Print handedness and draw hand landmarks on the image.

print(‘Handedness:’, results.multi_handedness)

if not results.multi_hand_landmarks:

continue

image_height, image_width, _ = image.shape

annotated_image = image.copy()

for hand_landmarks in results.multi_hand_landmarks:

print(‘hand_landmarks:’, hand_landmarks)

print(

f’Index finger tip coordinates: (‘,

f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, ‘

f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})’

)

mp_drawing.draw_landmarks(

annotated_image,

hand_landmarks,

mp_hands.HAND_CONNECTIONS,

mp_drawing_styles.get_default_hand_landmarks_style(),

mp_drawing_styles.get_default_hand_connections_style())

cv2.imwrite(

‘/tmp/annotated_image’ + str(idx) + ‘.png’, cv2.flip(annotated_image, 1))

# Draw hand world landmarks.

if not results.multi_hand_world_landmarks:

continue

for hand_world_landmarks in results.multi_hand_world_landmarks:

mp_drawing.plot_landmarks(

hand_world_landmarks, mp_hands.HAND_CONNECTIONS, azimuth=5)

Let’s do a simple analysis:

import cv2

import mediapipe as mp

mp_drawing = mp.solutions.drawing_utils

mp_drawing_styles = mp.solutions.drawing_styles

mp_hands = mp.solutions.hands

These paragraphs import the OpenCV library, the mediapipe library, and also import the drawing tools drawing_utils and drawing styles, as well as the hands solution class, instantiating it as the mp_hands object.

The initialization part is crucial, as it instantiates the Hands object and fine-tunes the supported configuration options. For example, static_image_mode enables static image mode, max_num_hands can detect a maximum of 2 hands, and the minimum confidence value is set to 0.5, meaning that as long as it reaches 0.5, detection will be considered successful. Although there may be misjudgments, the speed will be slightly improved. If you want to achieve higher accuracy, you can adjust the values here.

with mp_hands.Hands(

static_image_mode=True,

max_num_hands=2,

min_detection_confidence=0.5) as hands:

for idx, file in enumerate(IMAGE_FILES):

# Read an image, flip it around y-axis for correct handedness output (see

# above).

image = cv2.flip(cv2.imread(file), 1)

# Convert the BGR image to RGB before processing.

results = hands.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

This part means that the entire image is flipped along the Y-axis, so that in the screen view, your left and right hands appear mirrored, looking as if you are looking into a mirror. Otherwise, if you extend your left hand, it will show on the right side.

Another very crucial step is that the hands.process() function requires the entire parameter to be an RGB format image, but by default, the images captured using OpenCV are in BGR format and need to be converted; otherwise, detection will not be possible.

The last part of the content analysis:

Define three variables for image height and width, and channel, but the channel is not used, so it is replaced with “_”. The variable content is obtained through the shape property of the image object, and a copy is made using the copy() method as the object to operate on. A very crucial step here is to iterate through the values of multi_hand_landmarks in the results object and obtain the position of each key point on the screen through the information of the 21 landmarks. By obtaining the X and Y coordinates of each landmark, multiplying by the width and height of the image, the actual position in the image can be calculated.

image_height, image_width, _ = image.shape

annotated_image = image.copy()

for hand_landmarks in results.multi_hand_landmarks:

print(‘hand_landmarks:’, hand_landmarks)

print(

f’Index finger tip coordinates: (‘,

f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].x * image_width}, ‘

f'{hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP].y * image_height})’

)

This part starts calling the draw_landmarks() method to draw the diagram of the hand’s 21 points on the image and connects these points with line segments to form a schematic of the hand skeleton.

mp_drawing.draw_landmarks(

annotated_image,

hand_landmarks,

mp_hands.HAND_CONNECTIONS,

mp_drawing_styles.get_default_hand_landmarks_style(),

mp_drawing_styles.get_default_hand_connections_style())

The following part is to write the results into a graphic file.

cv2.imwrite(

‘/tmp/annotated_image’ + str(idx) + ‘.png’, cv2.flip(annotated_image, 1))

# Draw hand world landmarks.

if not results.multi_hand_world_landmarks:

continue

for hand_world_landmarks in results.multi_hand_world_landmarks:

mp_drawing.plot_landmarks(

hand_world_landmarks, mp_hands.HAND_CONNECTIONS, azimuth=5)

Temporary file path and file index, passing in the flipped image (rotated along the Y-axis).

Finally, we check if a hand is detected, displaying the node information of the hand and showing the connection information; otherwise, continue detecting.

Next, we will operate on the video stream.

# For webcam input:

Initialize an instance cap, set the camera index to 0 as a capture object to obtain each frame of the video.

cap = cv2.VideoCapture(0)

Also instantiate a Hands class, generating an object called hands.

with mp_hands.Hands(

model_complexity=0,

min_detection_confidence=0.5,

min_tracking_confidence=0.5) as hands:

Continuously check if the camera is on; if it is, use cap.read() to read a frame of the image.

while cap.isOpened():

success, image = cap.read()

if not success:

print(“Ignoring empty camera frame.”)

# If loading a video, use ‘break’ instead of ‘continue’.

continue

# To improve performance, optionally mark the image as not writeable to

# pass by reference.

Make the image read-only

image.flags.writeable = False

Convert the image format from BGR to RGB

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

Pass the image to the process method for processing.

results = hands.process(image)

# Draw the hand annotations on the image.

Make the image writable again

image.flags.writeable = True

image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)

if results.multi_hand_landmarks:

for hand_landmarks in results.multi_hand_landmarks:

mp_drawing.draw_landmarks(

image,

hand_landmarks,

mp_hands.HAND_CONNECTIONS,

mp_drawing_styles.get_default_hand_landmarks_style(),

mp_drawing_styles.get_default_hand_connections_style())

# Flip the image horizontally for a selfie-view display.

Flip the image and display it.

cv2.imshow(‘MediaPipe Hands’, cv2.flip(image, 1))

Detect if the ESC key is pressed every 5 milliseconds; if pressed, exit the loop and release the camera, and it’s best to close the window.

if cv2.waitKey(5) & 0xFF == 27:

break

cap.release()

cv2.destroyAllWindows()

Actual testing effect:

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Then detect all landmarks of the finger and draw circular points:

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

We only need the index finger, so we only draw when id equals 8. This allows us to implement the x and y coordinate parameters of the index finger’s id 8, multiply them by the width and height of the image, and obtain the actual position in the image, then draw a circular point.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

The code content is as follows:

import cv2

import time

import mediapipe as mp

cap = cv2.VideoCapture(0)

mp_draw = mp.solutions.drawing_utils

mp_hands = mp.solutions.hands

pTime = 0

detector = mp_hands.Hands(min_detection_confidence=0.5)

while True:

ret, frame = cap.read()

cTime = time.time()

fps = 1 / (cTime – pTime)

pTime = cTime

frame_RGB = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

result = detector.process(frame_RGB)

# print(result.multi_hand_landmarks)

if result.multi_hand_landmarks is not None:

print(result.multi_hand_landmarks[0].landmark)

for id, landmark in enumerate(result.multi_hand_landmarks[0].landmark):

print(id, landmark)

height, width, channel = frame.shape

if id == 8:

fx = int(landmark.x * width)

fy = int(landmark.y * height)

cv2.circle(frame, (fx, fy), 5, (0, 0, 255), cv2.FILLED)

cv2.putText(frame, “FPS:{}’.format(int(fps)), (10, 70), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 0), 3)

cv2.imshow(“image”, frame)

if cv2.waitKey(1) & 0xFF == 27: # press ESC key

break

cap.release()

cv2.destroyAllWindows()

Finally, we just need to use the rpi.gpio library to achieve the function of lighting an LED when the finger touches it. We will connect an LED with a 1k ohm resistor to pin 12 and connect the negative terminal to the GND pin.

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

If not installed, please execute in the terminal:

pip3 install RPi.GPIO

Add to the code:

import RPi.GPIO as GPIO

import cv2

import time

import mediapipe as mp

GPIO.setmode(GPIO.BOARD)

GPIO.setup(12, GPIO.OUT)

cap = cv2.VideoCapture(0)

mp_draw = mp.solutions.drawing_utils

mp_hands = mp.solutions.hands

pTime = 0

detector = mp_hands.Hands(min_detection_confidence=0.5)

while True:

ret, frame = cap.read()

cTime = time.time()

fps = 1 / (cTime – pTime)

pTime = cTime

frame_RGB = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

result = detector.process(frame_RGB)

# print(result.multi_hand_landmarks)

if result.multi_hand_landmarks is not None:

print(result.multi_hand_landmarks[0].landmark)

for id, landmark in enumerate(result.multi_hand_landmarks[0].landmark):

print(id, landmark)

height, width, channel = frame.shape

if id == 8:

fx = int(landmark.x * width)

fy = int(landmark.y * height)

cv2.circle(frame, (fx, fy), 5, (0, 0, 255), cv2.FILLED)

GPIO.output(12, GPIO.HIGH)

else:

GPIO.output(12, GPIO.LOW)

cv2.putText(frame, “FPS:{}’.format(int(fps)), (10, 70), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 0, 0), 3)

cv2.imshow(“image”, frame)

if cv2.waitKey(1) & 0xFF == 27: # press ESC key

break

cap.release()

cv2.destroyAllWindows()

Then save and execute this program; my program name is test_camera.py.

python test_camera.py

Then wave your finger in front of the camera, and as long as the tip of your index finger is detected, the LED will light up. We will use the simplest method to help you achieve the function of controlling an LED with your finger. Please brainstorm and try other operations. For example, by detecting the distance between two fingertips, you can achieve image scaling or control the audio volume.

By detecting the number of fingers, you can perform voice broadcasts, or play animations based on the finger position. The rest is up to you to create!

Highly Recommended:

“Raspberry Pi 4 and Practical AI Projects”

Deep dive into the world of Raspberry Pi and AI

Gesture Detection on Raspberry Pi 4B Using Google Mediapipe

Tsinghua University Press

ISBN: 9787302603252

Publication Date: 2022-07-01

Price: 79 RMB

Content Summary

This book mainly introduces the characteristics of different types of Raspberry Pi and the basic knowledge required for getting started; it covers various operation methods of Raspberry Pi GPIO, as well as hardware-related content such as Raspberry Pi I2C bus, SPI bus, UART serial port, PWM pulse width modulation, etc.; at the same time, it also prepares some common service types for readers to build and configure on Raspberry Pi, including methods for building Raspberry Pi streaming servers, installation and configuration of common databases MariaDB, PostgreSQL, MQTT server setup and configuration, DHCP server setup and configuration, etc.

In addition, it includes some interesting experiments, such as using TensorFlow for object detection, creating a Raspberry Pi scanner with OpenCV, or conducting a nose-swapping experiment with OpenCV, leading readers to understand some applications that can be implemented with Raspberry Pi using cameras.

This book provides a good entry point for beginners to comprehensively understand Raspberry Pi, allowing readers to learn more about the usage and operational tips of Raspberry Pi. At the same time, common programming languages such as C, Python, and Shell script are used throughout the programming process, making it more user-friendly for users with experience in these languages. I hope readers can find experiments they like here and smoothly get started with Raspberry Pi!

Author Introduction

Li Weibin (Drifting Bacteria) Currently employed as Chief Linux Architect at Wu Ai Information Technology (Shanghai) Co., Ltd. His main research direction is the application and development of embedded Linux operating systems. In his spare time, he has won first prize at the China-US Maker Marathon (Shanghai Station), second prize in the Intel IoT competition, the Brainstorm King Award in the first Brainstorming competition of the Shanghai International Maker Competition, and the Out of Box Award at the Junction 2021 Global Hackathon. He is regarded by friends as a super enthusiast and evangelist of Raspberry Pi and is also a co-founder of the Drifting Donkey project.

Leave a Comment

×