Building an ESP32 camera stream with AI detection on a web browser combines computer vision and IoT in a single, affordable package. The ESP32-CAM module with OV2640 camera and ESP32-S3’s built-in AI acceleration can stream live video and run basic object detection — all accessible from any browser on your local network. This guide covers everything from hardware setup to running AI face detection on the ESP32 camera stream.
Table of Contents
- ESP32 Camera Hardware Options
- ESP32-CAM Setup and Flashing
- Basic Video Streaming Server
- AI Face Detection Integration
- ESP32-S3 AI Camera Alternative
- Building the Web Interface
- Streaming Optimisation
- Frequently Asked Questions
ESP32 Camera Hardware Options
Two primary platforms for ESP32 camera + AI projects:
ESP32-CAM (AI Thinker)
- Based on ESP32 (dual-core 240 MHz) + OV2640 camera (2MP)
- Available in India for ₹200–₹400 — extremely cost-effective
- No onboard USB — needs external USB-to-TTL programmer for flashing
- 4MB PSRAM, 4MB flash, microSD slot
- Camera connector: 24-pin FPC, OV2640 or OV5640 compatible
ESP32-S3-CAM (ESP32-S3 based)
- ESP32-S3 with dedicated AI instructions + camera interface
- Better AI performance for face recognition and object detection
- Higher resolution support (OV5640 5MP)
- Available with USB-C programming port — no external programmer needed
- Price in India: ₹600–₹1200
ESP32-CAM Setup and Flashing
Flashing ESP32-CAM requires a USB-to-TTL adapter (CH340G, CP2102, or FTDI):
- Connect USB-TTL adapter to ESP32-CAM:
- 5V → 5V (use 5V, not 3.3V — CAM needs 5V for stability)
- GND → GND
- TX (TTL adapter) → RX (ESP32-CAM UOR)
- RX (TTL adapter) → TX (ESP32-CAM UOT)
- IO0 → GND (for programming mode)
- Install Arduino ESP32 support: Boards Manager → search “ESP32” → install by Espressif
- Select Board: AI Thinker ESP32-CAM
- Select correct COM port
- Upload your sketch with IO0 connected to GND
- After upload: disconnect IO0 from GND and press RST button
Basic Video Streaming Server
Arduino IDE includes a complete camera streaming example:
// File → Examples → ESP32 → Camera → CameraWebServer
// This is the built-in streaming example — just modify WiFi credentials
#include "esp_camera.h"
#include <WiFi.h>
const char* ssid = "YourWiFiSSID";
const char* password = "YourWiFiPassword";
// Camera pin definitions for AI Thinker ESP32-CAM
#define PWDN_GPIO_NUM 32
#define RESET_GPIO_NUM -1
#define XCLK_GPIO_NUM 0
#define SIOD_GPIO_NUM 26
#define SIOC_GPIO_NUM 27
#define Y9_GPIO_NUM 35
#define Y8_GPIO_NUM 34
#define Y7_GPIO_NUM 39
#define Y6_GPIO_NUM 36
#define Y5_GPIO_NUM 21
#define Y4_GPIO_NUM 19
#define Y3_GPIO_NUM 18
#define Y2_GPIO_NUM 5
#define VSYNC_GPIO_NUM 25
#define HREF_GPIO_NUM 23
#define PCLK_GPIO_NUM 22
void startCameraServer();
void setup() {
Serial.begin(115200);
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
// ... (all pin definitions)
config.xclk_freq_hz = 20000000;
config.pixel_format = PIXFORMAT_JPEG;
config.frame_size = FRAMESIZE_SVGA; // 800x600
config.jpeg_quality = 12;
config.fb_count = 2;
esp_err_t err = esp_camera_init(&config);
if (err != ESP_OK) {
Serial.printf("Camera init failed: 0x%x
", err);
return;
}
WiFi.begin(ssid, password);
while (WiFi.status() != WL_CONNECTED) delay(500);
startCameraServer();
Serial.printf("Camera ready! Connect to: http://%s
",
WiFi.localIP().toString().c_str());
}
void loop() { delay(10000); }
Open the IP address shown in Serial Monitor in your browser — you’ll see a live camera feed with controls for resolution, brightness, and face detection toggle.
AI Face Detection Integration
// ESP32 includes built-in face detection via the esp-face library
// The CameraWebServer example already includes this
// To enable face detection in your custom code:
#include "esp_camera.h"
#include "fd_forward.h"
#include "fr_forward.h"
// After initialising camera, create face detection handle:
static mtmn_config_t mtmn_config = {0};
mtmn_config.min_face = 80;
mtmn_config.pyramid = 0.707;
mtmn_config.pyramid_times = 4;
mtmn_config.p_threshold.score = 0.6;
mtmn_config.p_threshold.nms = 0.7;
mtmn_config.r_threshold.score = 0.7;
mtmn_config.r_threshold.nms = 0.7;
mtmn_config.o_threshold.score = 0.7;
mtmn_config.o_threshold.nms = 0.7;
// In capture loop:
camera_fb_t *fb = esp_camera_fb_get();
if (fb) {
// Convert to RGB for face detection
image_t img = {
.width = fb->width,
.height = fb->height,
.data = fb->buf,
};
box_array_t *pnet_boxes = face_detect(&img, &mtmn_config);
if (pnet_boxes != NULL) {
Serial.printf("Detected %d face(s)!
", pnet_boxes->len);
dl_lib_free(pnet_boxes->score);
dl_lib_free(pnet_boxes->box);
dl_lib_free(pnet_boxes);
}
esp_camera_fb_return(fb);
}
ESP32-S3 AI Camera Alternative
For better AI performance, ESP32-S3 with vector instructions runs AI models significantly faster:
// ESP32-S3 with PSRAM supports higher resolution + faster AI
// Use FRAMESIZE_UXGA (1600x1200) with 8MB PSRAM
config.frame_size = FRAMESIZE_UXGA;
config.jpeg_quality = 10;
config.fb_count = 2;
config.fb_location = CAMERA_FB_IN_PSRAM; // Use PSRAM for frame buffers
config.grab_mode = CAMERA_GRAB_LATEST; // Always get freshest frame
// ESP32-S3 AI instructions provide ~2-3x speedup for:
// - Face detection (50ms vs 150ms per frame)
// - Image classification
// - Simple object detection
Building the Web Interface
// Custom web interface with MJPEG stream in HTML
const char* htmlPage = R"rawliteral(
<!DOCTYPE html>
<html>
<body style="background:#111;color:#fff;text-align:center;font-family:Arial">
<h2>ESP32 AI Camera</h2>
<img src="/stream" style="max-width:640px;border-radius:8px">
<p id="detection">Detection: Waiting...</p>
<script>
setInterval(async () => {
const res = await fetch('/detection');
const data = await res.json();
document.getElementById('detection').textContent =
'Faces detected: ' + data.faces;
}, 1000);
</script>
</body>
</html>
)rawliteral";
Streaming Optimisation
- Frame rate vs quality tradeoff: FRAMESIZE_QVGA (320×240) at JPEG quality 15 gives ~15fps; SVGA (800×600) at quality 10 gives ~5fps on ESP32-CAM
- India WiFi tip: Use 2.4GHz for better range through walls. Place router closer to camera for smooth streaming.
- Reduce latency: Set
config.grab_mode = CAMERA_GRAB_LATESTon ESP32-S3 to always serve the newest frame - Multiple viewers: MJPEG stream only supports one viewer at a time efficiently. For multiple viewers, consider converting to HLS or using a local RTSP server
- Night vision: ESP32-CAM has a white LED (GPIO 4) for illumination. Use PWM to control brightness:
ledcWrite(4, 64);
Frequently Asked Questions
What is the maximum streaming resolution on ESP32-CAM?
ESP32-CAM with OV2640 supports up to UXGA (1600×1200) for still capture, but practical streaming resolution is SVGA (800×600) or VGA (640×480) at 5–10fps. Higher resolutions consume too much bandwidth and processing time for real-time streaming.
Can ESP32-CAM stream outside the local network to the internet?
Not directly — ESP32-CAM can only be accessed on the local WiFi network. For remote access, you need port forwarding on your router or a cloud relay service (like ngrok for testing, or a VPN for permanent access). DDNS services can help with dynamic IPs common in India.
How accurate is ESP32’s built-in face detection?
ESP32’s built-in MTMN face detection model is a lightweight model optimised for the limited MCU resources. It works well in good lighting with faces at typical distances (0.5–3m) from the camera. It detects presence/absence reliably but is not suitable for face recognition (identification) without the esp-face recognition library.
Does ESP32-CAM work with Home Assistant for security cameras?
Yes — ESP32-CAM MJPEG streams can be added to Home Assistant as Generic Camera entities. Enter the stream URL (http://ESP32_IP/stream) in the configuration. Home Assistant’s Frigate integration (if running on a more powerful server) can process the ESP32 stream for AI object detection.
What power supply is best for ESP32-CAM in India?
ESP32-CAM requires 5V/1A minimum due to camera current draw. The 3.3V pin cannot supply enough current for the camera. Use a dedicated 5V USB supply (not through another microcontroller’s 3.3V regulator). Indian USB chargers must be quality-branded to maintain stable voltage under load.
Add comment