Gesture recognition enables hands-free control of robots, smart home devices, and interactive displays. Using MediaPipe Hands and a Raspberry Pi camera, you can detect 21 hand landmarks in real time. This gesture recognition camera MediaPipe Raspberry Pi tutorial covers installation, hand landmark detection, gesture classification, and practical applications for India maker projects.
Table of Contents
- MediaPipe Hands Overview
- Hardware Setup
- Installation on Raspberry Pi
- Detecting Hand Landmarks
- Gesture Classification Logic
- Practical Applications
- Performance Optimisation
- FAQ
MediaPipe Hands Overview
MediaPipe Hands detects 21 3D landmarks on each hand. Indices 0-4: thumb (wrist to tip), 5-8: index finger, 9-12: middle, 13-16: ring, 17-20: pinky. By analysing relative landmark positions, you can classify gestures: thumbs up, peace sign, fist, open palm, pointing, and custom gestures.
Hardware Setup
Arducam IMX219 8MP Camera Module
8MP Sony IMX219 sensor, Raspberry Pi CSI compatible. Sharp image quality for accurate hand landmark detection. Wide angle model available for close-range gesture recognition.
Waveshare IMX219-77 Camera Module
IMX219 sensor with 77-degree FOV, ideal for gesture recognition at arm’s length distance. Compatible with Raspberry Pi 4/5 and Jetson Nano.
Installation on Raspberry Pi
sudo apt update
sudo apt install -y python3-opencv python3-pip python3-picamera2
pip3 install mediapipe
python3 -c "import mediapipe as mp; print(mp.__version__)"
MediaPipe 0.10+ supports ARM64 natively. On 32-bit Raspberry Pi OS, use the mediapipe-rpi4 package.
Detecting Hand Landmarks
import cv2, mediapipe as mp, numpy as np
from picamera2 import Picamera2
mp_hands = mp.solutions.hands
mp_draw = mp.solutions.drawing_utils
hands = mp_hands.Hands(max_num_hands=2, min_detection_confidence=0.7, min_tracking_confidence=0.5)
picam2 = Picamera2()
picam2.configure(picam2.create_preview_configuration(main={'size':(640,480),'format':'BGR888'}))
picam2.start()
while True:
frame = picam2.capture_array()
rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
results = hands.process(rgb)
if results.multi_hand_landmarks:
for hl, hd in zip(results.multi_hand_landmarks, results.multi_handedness):
mp_draw.draw_landmarks(frame, hl, mp_hands.HAND_CONNECTIONS)
h, w = frame.shape[:2]
lm = [[p.x*w, p.y*h] for p in hl.landmark]
gesture = classify_gesture(lm)
hand_type = hd.classification[0].label
cv2.putText(frame, f'{hand_type}: {gesture}', (10,50),
cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255,0), 2)
cv2.imshow('Gesture Recognition', frame)
if cv2.waitKey(1) & 0xFF == ord('q'): break
picam2.stop()
cv2.destroyAllWindows()
Gesture Classification Logic
def finger_up(lm, tip, pip_joint):
return lm[tip][1] < lm[pip_joint][1]
def classify_gesture(lm):
thumb = lm[4][0] < lm[3][0]
idx = finger_up(lm, 8, 6)
mid = finger_up(lm, 12, 10)
rng = finger_up(lm, 16, 14)
pnk = finger_up(lm, 20, 18)
all4 = [idx, mid, rng, pnk]
n = sum(all4)
if not any([thumb]+all4): return 'Fist'
if thumb and not any(all4): return 'Thumbs Up'
if idx and not mid and not rng and not pnk: return 'Point'
if idx and mid and not rng and not pnk: return 'Peace'
if all(all4) and thumb: return 'Open Palm'
return f'{n} Fingers'
Practical Applications
Smart home control: Fist = lights off, open palm = lights on, point up = fan speed increase. Connect GPIO outputs to relay modules for 220V appliance control common in Indian homes.
Robot arm control: Peace sign = gripper close, open palm = gripper open. Run gesture recognition on Pi and send commands to Arduino via serial.
Presentation remote: Point right = next slide, point left = previous. Use xdotool key Right on Raspberry Pi desktop for LibreOffice Impress control.
Arducam OV5642 Auto-Focus Camera
5MP OV5642 with motorised auto-focus. Ideal for gesture recognition at variable distances – keeps hand details sharp whether 20cm or 100cm from camera.
Performance Optimisation
MediaPipe on Raspberry Pi 4 runs at 8-15 FPS. To improve performance:
- Reduce resolution to 320×240 for close-range use – achieves 25+ FPS
- Set max_num_hands=1 – roughly doubles speed
- Skip every 2nd frame – use tracking between full detections
- Pi 5 – ARM Cortex-A76 gives ~3x throughput vs Pi 4
FAQ
Does MediaPipe work in Indian indoor lighting conditions?
Yes, but performance degrades under 50 lux. Add a small LED fill light. Flickering tube lights cause detection jitter – switch to LED lighting for consistent results.
Can I use a USB webcam instead of Pi Camera?
Yes. Replace Picamera2 with cv2.VideoCapture(0). CSI cameras are preferred for lower latency.
How do I train a custom gesture classifier?
Save landmark arrays to CSV with numpy.savetxt. Train scikit-learn’s RandomForestClassifier or SVM on your custom gestures. Inference runs at full frame rate on just 42 numbers per hand.
What is the maximum detection range?
MediaPipe reliably detects hands at 20-150cm. Beyond 150cm, landmark accuracy degrades. Crop and upscale the hand region for longer range detection.
Add comment