Running TinyML person detection on ESP32-CAM offline brings AI vision capabilities to a ₹500 device without any cloud dependency. TensorFlow Lite Micro runs a quantised MobileNet model directly on the ESP32’s CPU, enabling real-time person detection at the edge. This tutorial covers model deployment, inference code, and practical applications for Indian makers interested in edge AI.
Table of Contents
- What is TinyML?
- Pre-trained Person Detection Model
- Arduino Setup for TFLite Micro
- Inference Code
- Interpreting Output and Triggering Actions
- Optimisation Tips
- Frequently Asked Questions
What is TinyML?
TinyML (Tiny Machine Learning) refers to machine learning models optimised to run on microcontrollers and edge devices with limited memory and compute. Key characteristics:
- Models are quantised to INT8 or INT4 to reduce size (from MBs to KBs)
- No internet connection required — inference runs locally
- Low power consumption (milliwatts vs watts for server-based AI)
- Latency is predictable — no network variability
- Privacy: no images sent to cloud
The ESP32-CAM with 4MB PSRAM can run models up to ~500KB in size — enough for simple object detection and classification tasks.
Pre-trained Person Detection Model
Google’s open-source person detection model for microcontrollers is the starting point. It’s included in the TensorFlow Lite for Microcontrollers examples repository:
- Input: 96×96 greyscale image
- Output: Two scores — person / no person
- Model size: ~300KB (INT8 quantised)
- Accuracy: ~89% on test set (real-world lower in variable conditions)
- Inference time on ESP32: ~200–400ms per frame
Arduino Setup for TFLite Micro
// Install these libraries in Arduino IDE Library Manager:
// 1. "TensorFlow Lite for Microcontrollers" by EloquentTinyML
// (or the Harvard Edge Impulse version)
// 2. ESP32 board support (Espressif)
// Board settings for ESP32-CAM:
// Board: AI Thinker ESP32-CAM
// CPU Freq: 240 MHz (max for inference speed)
// Upload Speed: 115200
// Memory requirements:
// Model: ~300KB Flash
// Tensor arena: ~100KB PSRAM
// Total: ~400KB - fits in ESP32-CAM 4MB Flash
Inference Code
#include "esp_camera.h"
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "person_detect_model_data.h" // From TFLite examples
// AI Thinker pin definitions (same as before)
#define PWDN_GPIO_NUM 32
// ... (same camera config as RTSP example)
constexpr int kTensorArenaSize = 100 * 1024;
uint8_t tensor_arena[kTensorArenaSize];
tflite::AllOpsResolver resolver;
const tflite::Model* model;
tflite::MicroInterpreter* interpreter;
TfLiteTensor* input;
void setup() {
Serial.begin(115200);
// Camera init (same config, but GRAYSCALE, 96x96)
camera_config_t config;
// ... fill in pin definitions ...
config.pixel_format = PIXFORMAT_GRAYSCALE;
config.frame_size = FRAMESIZE_96X96;
config.fb_count = 1;
esp_camera_init(&config);
// Load TFLite model
model = tflite::GetModel(g_person_detect_model_data);
interpreter = new tflite::MicroInterpreter(
model, resolver, tensor_arena, kTensorArenaSize
);
interpreter->AllocateTensors();
input = interpreter->input(0);
Serial.println("TinyML Person Detection Ready");
}
void loop() {
camera_fb_t *fb = esp_camera_fb_get();
if (!fb) { Serial.println("Camera capture failed"); return; }
// Copy 96x96 grayscale frame to model input
for (int i = 0; i data.int8[i] = (int8_t)fb->buf[i] - 128;
}
esp_camera_fb_return(fb);
// Run inference
TfLiteStatus status = interpreter->Invoke();
if (status != kTfLiteOk) {
Serial.println("Inference failed");
return;
}
// Get output scores
TfLiteTensor* output = interpreter->output(0);
int8_t no_person_score = output->data.int8[0];
int8_t person_score = output->data.int8[1];
Serial.printf("Person: %d, No Person: %d
",
person_score, no_person_score);
if (person_score > no_person_score) {
Serial.println("PERSON DETECTED!");
// Trigger GPIO, LED, buzzer, relay, etc.
digitalWrite(LED_GPIO_NUM, HIGH);
} else {
digitalWrite(LED_GPIO_NUM, LOW);
}
delay(100);
}
Interpreting Output and Triggering Actions
The model outputs two INT8 scores. A score above 100 (out of 127) indicates high confidence. Typical thresholds:
- person_score > 100: High confidence person detected → trigger security alert, unlock door, turn on light
- person_score > 50: Moderate confidence → log detection, increment counter
- person_score ≤ 0: No person
Optimisation Tips
- Set CPU to 240 MHz (maximum) in
setCpuFrequencyMhz(240) - Use PSRAM for tensor arena:
uint8_t* tensor_arena = (uint8_t*)heap_caps_malloc(kTensorArenaSize, MALLOC_CAP_SPIRAM) - Run inference on every 3rd frame — person positions change slowly
- Consider Edge Impulse (edgeimpulse.com) for training custom models and deploying to ESP32 — they have an excellent free tier and Arduino library export
Frequently Asked Questions
Can I train my own custom TinyML model for ESP32-CAM?
Yes — Edge Impulse is the best platform for this. Collect images, annotate them in the web interface, train a model, and export an Arduino library. The entire process from data collection to deployment takes a few hours. Edge Impulse’s free tier supports models that fit on ESP32. Popular custom models: face detection, specific product detection, gesture recognition, defect detection for Indian manufacturing QC.
How accurate is the person detection model on ESP32-CAM?
In good lighting with a frontal view of a person, accuracy is typically 80–90%. In challenging conditions (backlit, partial view, unusual angles, Indian sarees/traditional clothing), accuracy drops to 60–75%. For security applications, combine person detection as a first filter with a higher-quality secondary check (Telegram alert + human review) rather than using it as a sole access control mechanism.
What other TinyML models can run on ESP32-CAM?
Models that fit in ~300KB INT8 quantised and process 96×96 images: keyword spotting (with microphone, not camera), gesture recognition (hand gestures), digit recognition, simple fruit/object classification. Models requiring higher resolution (96×96 is very low) or more processing power are better suited for Raspberry Pi with TensorFlow Lite or Jetson Nano with CUDA acceleration.
Add comment