The INMP441 microphone ESP32 voice recognition combination is one of the most popular setups in Indian IoT maker projects — and for good reason. The INMP441 is a high-quality I2S MEMS omnidirectional microphone that connects directly to the ESP32’s I2S peripheral, providing clean digital audio for voice processing applications. Whether you are building a smart home voice controller, an attendance system, or a language learning device, this guide gives you a complete working setup with code.
Table of Contents
- INMP441 Specifications and I2S Interface
- Wiring INMP441 to ESP32
- Audio Capture and Level Detection Code
- Voice Recognition with Edge Impulse
- Simple Wake Word Detection
- Smart Home Voice Control Example
- Frequently Asked Questions
INMP441 Specifications and I2S Interface
The InvenSense (now TDK) INMP441 is a high-performance, omnidirectional MEMS microphone in a bottom-ported LGA package. The breakout module makes it breadboard-compatible with pin headers:
- Frequency response: 60 Hz – 15 kHz (±3 dB), adequate for voice frequency range (100 Hz – 8 kHz)
- Signal-to-Noise Ratio (SNR): 61 dB(A) — better than most electret microphone modules
- Sensitivity: -26 dBFS at 94 dB SPL — captures speech at normal conversational distance (0.5–1 metre)
- Power consumption: 1.4mA active, 25μA power-down mode — suitable for battery-powered devices
- Supply voltage: 1.8–3.3V — directly compatible with ESP32 (3.3V logic)
- Digital output: I2S (24-bit PCM, up to 192kHz — though 16/44.1kHz is typical)
- Direction: Omnidirectional — picks up sound equally from all directions
The INMP441 module has six pins: VCC (3.3V), GND, SCK (BCLK), WS (LRCLK), SD (data output), and L/R (channel select: GND for left, 3.3V for right).
Wiring INMP441 to ESP32
// INMP441 → ESP32 Wiring
// INMP441 VCC → ESP32 3.3V
// INMP441 GND → ESP32 GND
// INMP441 SCK → ESP32 GPIO 14 (I2S Bit Clock)
// INMP441 WS → ESP32 GPIO 15 (I2S Word Select)
// INMP441 SD → ESP32 GPIO 32 (I2S Data Input)
// INMP441 L/R → GND (left channel = address 0)
// → 3.3V (right channel = address 1) if using two microphones
//
// For stereo (two INMP441 modules):
// Mic 1 L/R → GND (left channel)
// Mic 2 L/R → 3.3V (right channel)
// Both share SCK, WS, and SD lines
// Use I2S_CHANNEL_FMT_RIGHT_LEFT in config
Audio Capture and Level Detection Code
#include <driver/i2s.h>
#include <math.h>
#define I2S_PORT I2S_NUM_0
#define I2S_SCK 14
#define I2S_WS 15
#define I2S_SD_PIN 32
#define SAMPLE_RATE 16000
#define BUFFER_SIZE 512
int32_t rawBuffer[BUFFER_SIZE];
void setupI2S() {
i2s_config_t config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = 0,
.dma_buf_count = 8,
.dma_buf_len = BUFFER_SIZE,
.use_apll = false,
.tx_desc_auto_clear = false,
.fixed_mclk = 0
};
i2s_pin_config_t pins = {
.bck_io_num = I2S_SCK,
.ws_io_num = I2S_WS,
.data_out_num = I2S_PIN_NO_CHANGE,
.data_in_num = I2S_SD_PIN
};
i2s_driver_install(I2S_PORT, &config, 0, NULL);
i2s_set_pin(I2S_PORT, &pins);
i2s_zero_dma_buffer(I2S_PORT);
}
float calculateRMS() {
size_t bytesRead = 0;
i2s_read(I2S_PORT, rawBuffer, sizeof(rawBuffer), &bytesRead, portMAX_DELAY);
int samples = bytesRead / 4;
long long sumSquares = 0;
for (int i = 0; i < samples; i++) {
// INMP441 data is in MSBs of 32-bit word — shift to get useful range
int16_t sample = (int16_t)(rawBuffer[i] >> 16);
sumSquares += (long long)sample * sample;
}
return sqrt((float)sumSquares / samples);
}
void setup() {
Serial.begin(115200);
setupI2S();
Serial.println("INMP441 ready.");
}
void loop() {
float rms = calculateRMS();
// Roughly convert to dB (calibrate for your environment)
float dB = 20.0 * log10(rms + 1) - 30;
Serial.printf("RMS: %.1f | ~dB: %.1fn", rms, dB);
if (rms > 500) { // Threshold for speech detection
Serial.println(">>> Voice activity detected! <<<");
}
delay(50);
}
Voice Recognition with Edge Impulse
Edge Impulse (edgeimpulse.com) is a free machine learning platform that lets you train keyword detection models and deploy them on ESP32 without deep ML expertise:
- Create a project on Edge Impulse. Select "Keywords" as the project type.
- Collect training data: Use your INMP441 + ESP32 with the Edge Impulse data forwarder (or use the browser microphone). Record 50–100 samples each of: your target keywords ("lights on", "fan off"), background noise, and unknown words.
- Design and train: Edge Impulse processes audio with MFCC (Mel-Frequency Cepstral Coefficients) feature extraction and trains a neural network classifier. The Training accuracy for simple two-keyword models typically reaches 95–99%.
- Export and deploy: Export as Arduino library. Include the library in your sketch. The exported model runs entirely on the ESP32 at the edge — no internet connection needed for inference.
- Inference time: Typically 50–200ms per 1-second audio window on ESP32 — fast enough for real-time keyword detection.
Simple Wake Word Detection
Without a full ML model, you can implement a simple threshold-based voice activity detector (VAD) that wakes an IoT device when it hears any loud word, then listens for a command over WiFi or MQTT:
#include <driver/i2s.h>
#include <WiFi.h>
#include <PubSubClient.h>
// ... (setupI2S() from above) ...
const char* SSID = "YourWiFi";
const char* PASSWORD = "YourPassword";
const char* MQTT_SERVER = "192.168.1.100";
bool wakeWordDetected = false;
unsigned long lastDetectionTime = 0;
void setup() {
Serial.begin(115200);
WiFi.begin(SSID, PASSWORD);
while (WiFi.status() != WL_CONNECTED) delay(500);
setupI2S();
}
void loop() {
float rms = calculateRMS();
unsigned long now = millis();
// Voice Activity Detection: continuous sound above threshold for 200ms
if (rms > 800) {
if (!wakeWordDetected && (now - lastDetectionTime > 2000)) {
wakeWordDetected = true;
lastDetectionTime = now;
// Publish to MQTT: send a notification to process the command
Serial.println("Wake word detected — sending trigger!");
// mqttClient.publish("home/voice/trigger", "1");
}
} else {
if (now - lastDetectionTime > 500) {
wakeWordDetected = false;
}
}
delay(20);
}
Smart Home Voice Control Example
A practical voice-controlled home automation setup for Indian homes:
- Hardware: ESP32 + INMP441 (this guide) + relay module for appliances
- Commands for Indian context: “Lights on/off” in Hindi (“batti on/batti off”), fan speed control, geyser timer
- Cloud option: Dialogflow (Google) or LUIS (Microsoft) for natural language understanding — send the recorded audio over WiFi to cloud NLU, receive structured command back
- Privacy option: Run the full TensorFlow Lite model on ESP32 locally (Edge Impulse export) — no audio leaves the device, suitable for privacy-conscious households
- Integration: Use Home Assistant (Raspberry Pi) with MQTT integration — the ESP32 publishes recognised commands as MQTT messages, Home Assistant executes them through smart plugs, Zigbee lights, or directly via relay modules
Frequently Asked Questions
How far away can the INMP441 detect voice?
In a quiet room, the INMP441 reliably detects conversational speech (60–70 dB SPL) at up to 2–3 metres. For smart speaker applications with the device on a table, 1–2 metres is a realistic working range. Background noise (TV, fans, AC) reduces the effective range significantly. India’s typical home environment with ceiling fan running continuously reduces reliable detection to about 0.5–1 metre.
Can I use two INMP441 microphones for voice direction finding?
Yes. Connect two INMP441 modules to the same I2S bus (both share SCK and WS lines). Set one module’s L/R pin to GND (left channel) and the other to 3.3V (right channel). Configure ESP32 I2S for stereo input (I2S_CHANNEL_FMT_RIGHT_LEFT). By calculating the time delay difference between the two microphones (TDOA — Time Difference of Arrival), you can determine the approximate direction of a sound source.
Will the INMP441 work in India’s humid climate?
Yes. The INMP441 is rated for 0–70°C operating temperature and typical humidity ranges found in India. However, the breakout module’s PCB should be conformal coated if deployed in high-humidity environments (coastal areas, industrial humid zones). The MEMS sensing element itself is sealed, but PCB traces and connectors are vulnerable to condensation in extreme humidity.
Is Edge Impulse free for Indian students and hobbyists?
Yes. Edge Impulse has a free Developer tier that allows unlimited public projects with up to 4 million inferences per month per device. For students and hobbyists, this is more than sufficient. The free tier includes model training, deployment, and OTA updates. Commercial use requires the Professional plan, but educational and hobby projects remain free.
Add comment