The I2S audio protocol ESP32 combination enables high-quality digital audio in maker projects — far superior to the noisy PWM audio that beginners often start with. I2S (Inter-IC Sound) is a standard serial bus specifically designed to transfer digital audio data between ICs. The ESP32 has two dedicated hardware I2S peripherals that handle all the timing automatically, leaving your application code free to focus on the audio content rather than bit-banging. This guide explains I2S from signal level fundamentals to working ESP32 code.
Table of Contents
- I2S Signal Lines Explained
- I2S Timing and Data Format
- Master vs Slave Configuration
- ESP32 I2S Hardware Overview
- I2S Audio Output: ESP32 to DAC
- I2S Audio Input: Microphone to ESP32
- Frequently Asked Questions
I2S Signal Lines Explained
I2S uses three signal lines (plus power and ground) to transfer stereo audio data between chips:
- BCLK (Bit Clock / Serial Clock / SCK): This clock signal runs at a frequency equal to sample_rate × bits_per_sample × channels. For CD-quality audio (44.1kHz, 16-bit, stereo): BCLK = 44,100 × 16 × 2 = 1.4112 MHz. Each rising or falling edge of BCLK transfers one data bit.
- WS (Word Select / LRCLK / Frame Select / LRC): This signal switches at exactly the sample rate (44.1kHz) to indicate which channel is being transmitted. WS = LOW during the left channel frame; WS = HIGH during the right channel frame (in standard I2S format).
- SD/DATA (Serial Data): The actual audio samples as binary data, MSB (most significant bit) first. Left channel samples appear when WS = LOW, right channel when WS = HIGH.
Some I2S implementations use separate MCLK (Master Clock) — an additional clock running at 256× or 512× the sample rate. Some DAC chips (PCM5102, WM8960) require MCLK for their internal PLL. The ESP32 can generate MCLK on a dedicated output pin.
I2S Timing and Data Format
The standard I2S frame format (Philips I2S standard) works as follows:
- Left channel data is transmitted during the WS LOW phase
- The MSB is sent on the second BCLK pulse after the WS transition (1 BCLK delay)
- After all data bits are sent, remaining bit positions are zero-padded
For a 16-bit sample at 44.1kHz:
// I2S Frame Timing (16-bit, 44.1kHz)
// BCLK frequency = 44100 Hz × 32 = 1,411,200 Hz (1.41 MHz)
// (32 BCLK cycles per WS period: 16 bits left + 16 bits right)
//
// WS: ___/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾___/‾‾‾‾‾
// Left channel Right channel
//
// DATA: MSB → LSB → 0,0... MSB → LSB → 0,0...
// (16 bits) (16 zeros) (16 bits) (16 zeros)
// Common I2S format variations:
// Standard I2S (Philips): 1 BCLK delay after WS change
// Left Justified: MSB immediately after WS change
// Right Justified (Sony): LSB on last BCLK before WS change
// DSP Mode / PCM Mode: Used for TDM multi-channel
Master vs Slave Configuration
In an I2S connection, one side generates the clocks (BCLK and WS) — this is the master. The other side (the slave) receives the clocks and synchronises its data transmission/reception to them.
- ESP32 as master (most common): ESP32 generates BCLK and WS, drives them to the DAC/ADC chip. The DAC (MAX98357A, PCM5102) receives clocks and sends analog audio. Use this for audio playback applications.
- ESP32 as slave: An external clock source (another DAC, audio DSP) drives BCLK and WS to the ESP32. ESP32 receives audio data synchronised to the external clock. Use this when integrating ESP32 into an existing audio system.
- Microphone (INMP441) as slave: The INMP441 always operates as an I2S slave — the ESP32 master provides BCLK and WS, and the INMP441 sends audio data. For microphone input, ESP32 is always the master.
ESP32 I2S Hardware Overview
The ESP32 has two hardware I2S controllers (I2S0 and I2S1), each capable of:
- Simultaneous transmit (to speaker DAC) and receive (from microphone)
- Sample rates from 1 kHz to 96 kHz
- 8, 16, 24, or 32 bits per sample
- DMA (Direct Memory Access) transfer — audio data moves directly between RAM and I2S peripheral without CPU intervention, enabling real-time audio processing
- Configurable pin mapping — any GPIO can be assigned to BCLK, WS, or DATA (some restrictions apply)
The ESP32-S3 (newer variant) has an enhanced I2S controller with support for TDM (Time Division Multiplex) for multi-microphone arrays, and PDM (Pulse Density Modulation) for direct connection to PDM microphones.
I2S Audio Output: ESP32 to DAC
// ESP32 I2S Output Example: Generate a 1kHz sine wave
#include <driver/i2s.h>
#include <math.h>
#define I2S_NUM I2S_NUM_0
#define SAMPLE_RATE 44100
#define SAMPLE_BITS 16
#define I2S_BCLK_PIN 26
#define I2S_WS_PIN 25
#define I2S_DOUT_PIN 22
void i2s_init() {
i2s_config_t config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX),
.sample_rate = SAMPLE_RATE,
.bits_per_sample = (i2s_bits_per_sample_t)SAMPLE_BITS,
.channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
.dma_buf_count = 8,
.dma_buf_len = 64,
.use_apll = true, // Use Audio PLL for accurate clock
.tx_desc_auto_clear = true
};
i2s_pin_config_t pins = {
.bck_io_num = I2S_BCLK_PIN,
.ws_io_num = I2S_WS_PIN,
.data_out_num = I2S_DOUT_PIN,
.data_in_num = I2S_PIN_NO_CHANGE
};
i2s_driver_install(I2S_NUM, &config, 0, NULL);
i2s_set_pin(I2S_NUM, &pins);
}
void setup() {
i2s_init();
}
void loop() {
// Generate 1 kHz sine wave (1 cycle = 44 samples at 44100 Hz)
const int SINE_SAMPLES = 44;
int16_t sine_wave[SINE_SAMPLES * 2]; // Stereo: L + R pairs
for (int i = 0; i < SINE_SAMPLES; i++) {
int16_t sample = (int16_t)(32767.0 * sin(2.0 * M_PI * i / SINE_SAMPLES));
sine_wave[i * 2] = sample; // Left
sine_wave[i * 2 + 1] = sample; // Right
}
size_t bytesWritten;
i2s_write(I2S_NUM, sine_wave, sizeof(sine_wave), &bytesWritten, portMAX_DELAY);
}
I2S Audio Input: Microphone to ESP32
// ESP32 I2S Input: INMP441 Microphone
#include <driver/i2s.h>
#define I2S_NUM I2S_NUM_0
#define I2S_SCK 14 // Bit clock → INMP441 SCK
#define I2S_WS 15 // Word select → INMP441 WS
#define I2S_SD 32 // Serial data ← INMP441 SD
void i2s_mic_init() {
i2s_config_t config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
.sample_rate = 16000, // 16kHz for voice applications
.bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_LEFT,
.communication_format = I2S_COMM_FORMAT_STAND_I2S,
.intr_alloc_flags = 0,
.dma_buf_count = 8,
.dma_buf_len = 256
};
i2s_pin_config_t pins = {
.bck_io_num = I2S_SCK,
.ws_io_num = I2S_WS,
.data_out_num = I2S_PIN_NO_CHANGE,
.data_in_num = I2S_SD
};
i2s_driver_install(I2S_NUM, &config, 0, NULL);
i2s_set_pin(I2S_NUM, &pins);
}
void setup() {
Serial.begin(115200);
i2s_mic_init();
Serial.println("INMP441 ready");
}
// Note: INMP441 with 32-bit config outputs 18-bit data in MSBs
// Shift right by 14 to get 18-bit signed value, or by 16 for 16-bit
int32_t raw_samples[256];
void loop() {
size_t bytesRead;
i2s_read(I2S_NUM, raw_samples, sizeof(raw_samples), &bytesRead, portMAX_DELAY);
int samplesRead = bytesRead / 4;
long sum = 0;
for (int i = 0; i < samplesRead; i++) sum += abs(raw_samples[i] >> 14);
float rms = sum / samplesRead;
Serial.println(rms); // Higher values = louder sound
}
Frequently Asked Questions
What is the difference between I2S and I2C?
I2C (Inter-Integrated Circuit) is a low-speed (<1MHz) multi-device bus for control registers and sensor data — sensors, displays, IMUs. I2S (Inter-IC Sound) is specifically designed for high-speed (1–50 MHz) audio data streaming. They share a similar name convention but serve completely different purposes. An audio project might use I2C to configure the codec’s registers (volume, equaliser settings) and I2S to stream the actual audio data.
Can I use I2S audio and WiFi simultaneously on ESP32?
Yes, with care. WiFi uses one of the two I2S peripherals (I2S0) internally on some ESP32 variants. Use I2S1 for audio to avoid conflicts. In practice, WiFi and I2S audio coexist well at low WiFi data rates, but audio dropouts can occur during high-throughput WiFi operations. Use DMA-based I2S with sufficient DMA buffer depth (8–16 buffers of 256 samples) to absorb WiFi interrupt latency without dropouts.
Why does my I2S audio have pops and clicks?
DMA buffer underruns — your code is not feeding the I2S DMA fast enough, causing the hardware to insert zero samples (resulting in audible pops). Increase DMA buffer count (from 4 to 8 or 16). Also check if your loop() function is blocked by delays, Serial.println() (which is slow), or other operations that prevent audio data from being written to the DMA in time.
Does ESP32-C3 (RISC-V) support I2S?
Yes, the ESP32-C3 has one I2S controller. However, it is more limited than the dual I2S on the original ESP32. ESP32-S3 is the preferred choice for audio applications — it has two I2S controllers, PDM support, and the ESP-ADF (Audio Development Framework) is optimised for it with support for TensorFlow Lite for voice recognition at the edge.
Add comment