Building a stereo camera system for depth estimation with OpenCV is the most accessible way to add 3D vision capabilities to your robotics or automation project. By comparing the slight difference in perspective between two cameras (like human binocular vision), stereo vision computes the depth to every point in the scene. This guide covers stereo camera setup, calibration, disparity map generation, and depth estimation in Python.
Table of Contents
- Stereo Vision Principle
- Hardware Setup
- Stereo Camera Calibration
- Stereo Rectification
- Disparity Map Computation
- Converting Disparity to Depth
- Applications in Indian Robotics
- Frequently Asked Questions
Stereo Vision Principle
Stereo vision works by triangulation: the same 3D point appears at different horizontal positions in the left and right images (disparity). The depth (Z) is inversely proportional to disparity:
Z = (f × B) / d
Where:
Z = depth to the point (metres)
f = focal length in pixels
B = baseline (distance between cameras, metres)
d = disparity (pixel difference between left and right images)
Example: f=700px, B=0.12m, d=50px -> Z = (700 × 0.12) / 50 = 1.68m
Hardware Setup
- Camera pair: Two identical cameras (same model, same firmware). IMX219 (OV5647 also works) on Raspberry Pi via dual CSI or USB.
- Baseline: Distance between cameras. 60–150mm for desktop use. Larger baseline = better accuracy at longer distances.
- Mounting: Cameras must be rigidly fixed parallel to each other, aligned horizontally. Even slight rotation dramatically degrades results.
- Sync: For Raspberry Pi 5 (which has two CSI ports): both cameras capture simultaneously. For USB cameras: synchronisation is approximate.
Stereo Camera Calibration
import cv2
import numpy as np
import glob
# Checkerboard parameters (print an 8x6 checkerboard, ~30mm squares)
CHECKERBOARD = (7, 5) # Interior corners (width-1, height-1)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# Prepare object points (0,0,0), (1,0,0), (2,0,0) ... scaled to square size
objp = np.zeros((CHECKERBOARD[0]*CHECKERBOARD[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHECKERBOARD[0], 0:CHECKERBOARD[1]].T.reshape(-1, 2)
objp *= 30 # 30mm square size in mm
objpoints, imgpoints_l, imgpoints_r = [], [], []
for img_l, img_r in zip(glob.glob('calib/left*.jpg'),
glob.glob('calib/right*.jpg')):
l, r = cv2.imread(img_l, 0), cv2.imread(img_r, 0)
ret_l, corners_l = cv2.findChessboardCorners(l, CHECKERBOARD)
ret_r, corners_r = cv2.findChessboardCorners(r, CHECKERBOARD)
if ret_l and ret_r:
objpoints.append(objp)
imgpoints_l.append(cv2.cornerSubPix(l, corners_l, (11,11), (-1,-1), criteria))
imgpoints_r.append(cv2.cornerSubPix(r, corners_r, (11,11), (-1,-1), criteria))
h, w = l.shape
# Calibrate each camera individually first
ret, K1, D1, _, _ = cv2.calibrateCamera(objpoints, imgpoints_l, (w,h), None, None)
ret, K2, D2, _, _ = cv2.calibrateCamera(objpoints, imgpoints_r, (w,h), None, None)
# Stereo calibration
ret, K1, D1, K2, D2, R, T, E, F = cv2.stereoCalibrate(
objpoints, imgpoints_l, imgpoints_r, K1, D1, K2, D2, (w,h),
flags=cv2.CALIB_FIX_INTRINSIC
)
print(f"Stereo calibration RMS error: {ret:.3f} pixels")
np.savez('stereo_params.npz', K1=K1, D1=D1, K2=K2, D2=D2, R=R, T=T)
Disparity Map Computation
import cv2
import numpy as np
# Load calibration parameters
params = np.load('stereo_params.npz')
K1, D1, K2, D2 = params['K1'], params['D1'], params['K2'], params['D2']
R, T = params['R'], params['T']
h, w = 480, 640
# Compute rectification transforms
R1, R2, P1, P2, Q, _, _ = cv2.stereoRectify(
K1, D1, K2, D2, (w, h), R, T, alpha=0
)
# Compute rectification maps (pre-compute for speed)
map1_l, map2_l = cv2.initUndistortRectifyMap(K1, D1, R1, P1, (w,h), cv2.CV_16SC2)
map1_r, map2_r = cv2.initUndistortRectifyMap(K2, D2, R2, P2, (w,h), cv2.CV_16SC2)
# StereoSGBM - better quality than StereoBM
stereo = cv2.StereoSGBM_create(
minDisparity=0,
numDisparities=128, # Must be divisible by 16
blockSize=11,
P1=8 * 3 * 11**2,
P2=32 * 3 * 11**2,
disp12MaxDiff=1,
uniquenessRatio=10,
speckleWindowSize=100,
speckleRange=32
)
def compute_depth(left_img, right_img):
# Rectify both images
left_rect = cv2.remap(left_img, map1_l, map2_l, cv2.INTER_LINEAR)
right_rect = cv2.remap(right_img, map1_r, map2_r, cv2.INTER_LINEAR)
# Convert to grayscale for disparity
left_gray = cv2.cvtColor(left_rect, cv2.COLOR_BGR2GRAY)
right_gray = cv2.cvtColor(right_rect, cv2.COLOR_BGR2GRAY)
# Compute disparity map
disparity = stereo.compute(left_gray, right_gray).astype(np.float32) / 16.0
# Convert disparity to depth using Q matrix
points_3D = cv2.reprojectImageTo3D(disparity, Q)
depth_map = points_3D[:,:,2] # Z component is depth
return disparity, depth_map
Applications in Indian Robotics
- Obstacle avoidance robots: Stereo vision provides real-time 3D environment mapping for navigation in Indian homes and offices
- Agricultural robot: Measure fruit/vegetable size and position for robotic harvesting — relevant for India’s agricultural automation push
- Industrial quality control: Measure dimensions of manufactured parts in 3D — useful for Indian SME manufacturing
- Bin picking robots: Identify and locate objects in bins for automated sorting and picking
Frequently Asked Questions
What baseline distance should I use for my stereo camera?
The optimal baseline depends on the distance range you want to measure accurately. A good rule: baseline = depth_range × 0.05 to 0.1. For 0.5–3m range: 75–150mm baseline. For 1–10m range: 200–500mm baseline. Larger baselines give better accuracy at longer distances but create more matching difficulty for close objects.
Can I use two USB cameras instead of two CSI cameras for stereo vision on Pi?
Yes, but with limitations. Two USB cameras typically share bandwidth on Pi 3/4 (single USB controller), limiting each to 480p or requiring reduced frame rates. Pi 5 has better USB throughput. The main challenge is temporal synchronisation — USB cameras don’t trigger simultaneously, causing moving object ghosting. For static scenes or slow-moving objects, two USB cameras work fine for stereo depth estimation.
Add comment