Stereo Camera Setup for Depth Estimation: OpenCV Guide

Building a stereo camera system for depth estimation with OpenCV is the most accessible way to add 3D vision capabilities to your robotics or automation project. By comparing the slight difference in perspective between two cameras (like human binocular vision), stereo vision computes the depth to every point in the scene. This guide covers stereo camera setup, calibration, disparity map generation, and depth estimation in Python.

Stereo Vision Principle
Hardware Setup
Stereo Camera Calibration
Stereo Rectification
Disparity Map Computation
Converting Disparity to Depth
Applications in Indian Robotics
Frequently Asked Questions

Stereo Vision Principle

Stereo vision works by triangulation: the same 3D point appears at different horizontal positions in the left and right images (disparity). The depth (Z) is inversely proportional to disparity:

Z = (f × B) / d
Where:
  Z = depth to the point (metres)
  f = focal length in pixels
  B = baseline (distance between cameras, metres)
  d = disparity (pixel difference between left and right images)

Example: f=700px, B=0.12m, d=50px -> Z = (700 × 0.12) / 50 = 1.68m

Recommended: Arducam 8MP IMX219 Camera for Raspberry Pi — Use two of these cameras mounted on a rigid baseline (120mm recommended) for an 8MP stereo camera system on Raspberry Pi 5.

Hardware Setup

Camera pair: Two identical cameras (same model, same firmware). IMX219 (OV5647 also works) on Raspberry Pi via dual CSI or USB.
Baseline: Distance between cameras. 60–150mm for desktop use. Larger baseline = better accuracy at longer distances.
Mounting: Cameras must be rigidly fixed parallel to each other, aligned horizontally. Even slight rotation dramatically degrades results.
Sync: For Raspberry Pi 5 (which has two CSI ports): both cameras capture simultaneously. For USB cameras: synchronisation is approximate.

Stereo Camera Calibration

import cv2
import numpy as np
import glob

# Checkerboard parameters (print an 8x6 checkerboard, ~30mm squares)
CHECKERBOARD = (7, 5)  # Interior corners (width-1, height-1)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)

# Prepare object points (0,0,0), (1,0,0), (2,0,0) ... scaled to square size
objp = np.zeros((CHECKERBOARD[0]*CHECKERBOARD[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHECKERBOARD[0], 0:CHECKERBOARD[1]].T.reshape(-1, 2)
objp *= 30  # 30mm square size in mm

objpoints, imgpoints_l, imgpoints_r = [], [], []

for img_l, img_r in zip(glob.glob('calib/left*.jpg'), 
                          glob.glob('calib/right*.jpg')):
    l, r = cv2.imread(img_l, 0), cv2.imread(img_r, 0)
    ret_l, corners_l = cv2.findChessboardCorners(l, CHECKERBOARD)
    ret_r, corners_r = cv2.findChessboardCorners(r, CHECKERBOARD)
    
    if ret_l and ret_r:
        objpoints.append(objp)
        imgpoints_l.append(cv2.cornerSubPix(l, corners_l, (11,11), (-1,-1), criteria))
        imgpoints_r.append(cv2.cornerSubPix(r, corners_r, (11,11), (-1,-1), criteria))

h, w = l.shape
# Calibrate each camera individually first
ret, K1, D1, _, _ = cv2.calibrateCamera(objpoints, imgpoints_l, (w,h), None, None)
ret, K2, D2, _, _ = cv2.calibrateCamera(objpoints, imgpoints_r, (w,h), None, None)

# Stereo calibration
ret, K1, D1, K2, D2, R, T, E, F = cv2.stereoCalibrate(
    objpoints, imgpoints_l, imgpoints_r, K1, D1, K2, D2, (w,h),
    flags=cv2.CALIB_FIX_INTRINSIC
)
print(f"Stereo calibration RMS error: {ret:.3f} pixels")
np.savez('stereo_params.npz', K1=K1, D1=D1, K2=K2, D2=D2, R=R, T=T)

Disparity Map Computation

import cv2
import numpy as np

# Load calibration parameters
params = np.load('stereo_params.npz')
K1, D1, K2, D2 = params['K1'], params['D1'], params['K2'], params['D2']
R, T = params['R'], params['T']

h, w = 480, 640

# Compute rectification transforms
R1, R2, P1, P2, Q, _, _ = cv2.stereoRectify(
    K1, D1, K2, D2, (w, h), R, T, alpha=0
)

# Compute rectification maps (pre-compute for speed)
map1_l, map2_l = cv2.initUndistortRectifyMap(K1, D1, R1, P1, (w,h), cv2.CV_16SC2)
map1_r, map2_r = cv2.initUndistortRectifyMap(K2, D2, R2, P2, (w,h), cv2.CV_16SC2)

# StereoSGBM - better quality than StereoBM
stereo = cv2.StereoSGBM_create(
    minDisparity=0,
    numDisparities=128,  # Must be divisible by 16
    blockSize=11,
    P1=8 * 3 * 11**2,
    P2=32 * 3 * 11**2,
    disp12MaxDiff=1,
    uniquenessRatio=10,
    speckleWindowSize=100,
    speckleRange=32
)

def compute_depth(left_img, right_img):
    # Rectify both images
    left_rect = cv2.remap(left_img, map1_l, map2_l, cv2.INTER_LINEAR)
    right_rect = cv2.remap(right_img, map1_r, map2_r, cv2.INTER_LINEAR)
    
    # Convert to grayscale for disparity
    left_gray = cv2.cvtColor(left_rect, cv2.COLOR_BGR2GRAY)
    right_gray = cv2.cvtColor(right_rect, cv2.COLOR_BGR2GRAY)
    
    # Compute disparity map
    disparity = stereo.compute(left_gray, right_gray).astype(np.float32) / 16.0
    
    # Convert disparity to depth using Q matrix
    points_3D = cv2.reprojectImageTo3D(disparity, Q)
    depth_map = points_3D[:,:,2]  # Z component is depth
    
    return disparity, depth_map

Applications in Indian Robotics

Obstacle avoidance robots: Stereo vision provides real-time 3D environment mapping for navigation in Indian homes and offices
Agricultural robot: Measure fruit/vegetable size and position for robotic harvesting — relevant for India’s agricultural automation push
Industrial quality control: Measure dimensions of manufactured parts in 3D — useful for Indian SME manufacturing
Bin picking robots: Identify and locate objects in bins for automated sorting and picking

Frequently Asked Questions

What baseline distance should I use for my stereo camera?

The optimal baseline depends on the distance range you want to measure accurately. A good rule: baseline = depth_range × 0.05 to 0.1. For 0.5–3m range: 75–150mm baseline. For 1–10m range: 200–500mm baseline. Larger baselines give better accuracy at longer distances but create more matching difficulty for close objects.

Can I use two USB cameras instead of two CSI cameras for stereo vision on Pi?

Yes, but with limitations. Two USB cameras typically share bandwidth on Pi 3/4 (single USB controller), limiting each to 480p or requiring reduced frame rates. Pi 5 has better USB throughput. The main challenge is temporal synchronisation — USB cameras don’t trigger simultaneously, causing moving object ghosting. For static scenes or slow-moving objects, two USB cameras work fine for stereo depth estimation.

Shop Camera Modules at Zbotic →

Stereo Camera Setup for Depth Estimation: OpenCV Guide