If you have ever wondered why your microcontroller slows to a crawl while copying large blocks of data between memory and peripherals, a DMA controller in a microcontroller is the answer you have been looking for. Direct Memory Access (DMA) is one of the most powerful yet under-utilised features in modern MCUs, and understanding it can transform the performance of your embedded projects — from drone flight controllers to industrial sensor loggers running on Indian maker benches.
What Is a DMA Controller?
A DMA (Direct Memory Access) controller is a dedicated hardware block inside (or alongside) a microcontroller that can move data between memory regions or between a peripheral and memory without involving the CPU for every byte transferred. The CPU simply sets up the transfer — source address, destination address, and length — and the DMA controller does the heavy lifting in the background.
Think of it like this: normally the CPU acts as a delivery boy, personally carrying each parcel (byte) from one place to another. With DMA, the CPU hires a courier service (the DMA controller) and only has to give the pickup and drop-off addresses. While the courier moves parcels, the CPU can work on something else entirely.
Most modern 32-bit MCUs — STM32, ESP32, RP2040, nRF52840, and even some 8-bit AVRs via USART DMA — include one or more DMA controllers as standard silicon. Understanding DMA is essential if you want to build high-speed data logging systems, audio processing circuits, or fast SPI/I2C/UART communication pipelines.
How DMA Works Inside an MCU
A DMA transfer involves several key components working together:
- DMA Controller (DMAC): The hardware engine that manages transfer channels, arbitration, and bus access.
- Source and Destination Addresses: Memory locations or peripheral data registers involved in the transfer.
- Transfer Count: The number of data items (bytes, half-words, or words) to move.
- Channel/Stream: A logical pipeline inside the DMA controller. STM32F4, for example, has 2 DMA controllers each with 8 streams and 8 channels per stream — providing up to 16 independent data highways.
- Bus Arbitration: When both the CPU and DMA need the bus simultaneously, an arbiter decides priority. Most MCUs give DMA priority for short bursts to keep peripherals fed.
The typical DMA transfer sequence:
- CPU configures the DMA channel: source, destination, length, data width, and increment mode.
- CPU enables the DMA channel and triggers the transfer (either immediately or waits for a peripheral request).
- DMA controller requests the system bus, performs the transfer in bursts or single cycles.
- When complete, the DMA generates an interrupt so the CPU knows to process the newly arrived data.
During the transfer, the CPU is free to execute other code — or can optionally wait in a low-power sleep state, saving energy while DMA keeps data moving. This architecture is especially valuable in battery-powered IoT devices common in Indian maker projects.
10 x 10 cm Universal PCB Prototype Board
Ideal for prototyping DMA-based circuits — build your STM32 or ESP32 DMA test rigs on this standard 2.54mm pitch board.
DMA vs CPU-Driven Transfer: Performance Comparison
To appreciate DMA, you need to see the numbers. Consider transferring 1024 bytes from an ADC peripheral to RAM on an STM32F103 running at 72 MHz:
| Method | CPU Cycles Used | CPU Free for Other Tasks | Transfer Time |
|---|---|---|---|
| Polling (CPU) | ~4096 cycles | 0% | ~56 µs |
| Interrupt-driven (CPU) | ~2048 cycles (ISR overhead) | ~50% | ~56 µs |
| DMA | ~100 cycles (setup + ISR) | ~97% | ~56 µs |
The transfer time is similar, but with DMA the CPU is nearly completely free. For a system sampling an ADC at 44.1 kHz (audio quality), CPU polling would consume nearly all processing time. With DMA, the CPU handles only the completed-buffer interrupt and can run DSP algorithms, update a display, or handle UART communication simultaneously.
DMA Transfer Modes Explained
Modern DMA controllers support several transfer modes, each suited to different use cases:
1. Normal (Single) Mode
The DMA performs a fixed number of transfers and stops. Perfect for one-shot operations like reading a fixed-size data packet from SPI. The CPU receives an interrupt when done and must reconfigure the DMA for the next transfer.
2. Circular Mode
After completing the configured number of transfers, the DMA automatically resets its counter and starts again from the beginning. Ideal for continuous ADC sampling, audio streaming, or UART receive buffers. You typically use a double-buffer (ping-pong) approach: while DMA fills the second half of the buffer, the CPU processes the first half.
3. Memory-to-Memory Mode
Used to quickly copy data between two memory regions — faster than memcpy() on most MCUs because the DMA uses dedicated bus bandwidth. Handy for copying framebuffers to display drivers.
4. Burst Mode
On advanced MCUs like STM32F4/F7/H7, DMA can request multiple beats on the AHB bus in a single arbitration — further reducing overhead. Burst sizes of 4, 8, or 16 beats are common.
10CM Male To Female Breadboard Jumper Wires – 40Pcs
Connect your STM32 or Arduino to peripheral modules cleanly while testing DMA-driven communication interfaces.
Using DMA on STM32 Microcontrollers
STM32 microcontrollers (manufactured by STMicroelectronics) are the most popular platform for DMA learning among Indian engineering students, and for good reason — they have rich DMA hardware, excellent HAL library support, and boards like the Blue Pill (STM32F103) cost under ₹150.
Here is a minimal STM32 HAL example for ADC with DMA in circular mode:
/* Configure ADC with DMA in CubeMX, then in main.c: */
uint16_t adcBuffer[512];
HAL_ADC_Start_DMA(&hadc1, (uint32_t*)adcBuffer, 512);
/* Half-transfer callback — first 256 samples ready */
void HAL_ADC_ConvHalfCpltCallback(ADC_HandleTypeDef* hadc) {
processData(adcBuffer, 256);
}
/* Full-transfer callback — second 256 samples ready */
void HAL_ADC_ConvCpltCallback(ADC_HandleTypeDef* hadc) {
processData(&adcBuffer[256], 256);
}
Key STM32 DMA configuration points in STM32CubeMX:
- Channel/Stream selection: Each peripheral has fixed DMA channel assignments in the Reference Manual — do not guess, look it up.
- Data width: Match peripheral register width (ADC → half-word/16-bit).
- Memory increment: Enable for buffers, disable for single-register destinations.
- Priority: Set DMA stream priority relative to other streams (Low/Medium/High/Very High).
- NVIC: Enable the DMA stream interrupt in CubeMX Interrupt settings tab.
For UART DMA receive, use HAL_UARTEx_ReceiveToIdle_DMA() — this is especially useful for variable-length packets because it triggers on both full buffer and UART idle line detection.
DMA on Arduino and AVR Platforms
Classic Arduino (ATmega328P) does not have a general-purpose DMA controller, which is a significant limitation for high-speed data applications. However, some tricks exist:
- USART with DMA (ATmega2560): No hardware DMA, but the USART can work with interrupt-driven ring buffers that mimic double-buffering.
- Arduino Due (SAM3X8E): Has a full PDC (Peripheral DMA Controller) accessible via the SAM3X HAL or the DMAChannel library.
- Arduino Zero / MKR series (SAMD21): Excellent 12-channel DMA via the DMAC peripheral. The ArduinoCore-samd supports DMA for SPI and I2S.
- Raspberry Pi Pico (RP2040): 12 DMA channels, accessible via the C SDK
dma_channel_configure()or the Arduino-Pico core. Very popular in India for audio and USB projects.
For RP2040 DMA example (C SDK):
int dma_chan = dma_claim_unused_channel(true);
dma_channel_config c = dma_channel_get_default_config(dma_chan);
channel_config_set_transfer_data_size(&c, DMA_SIZE_8);
channel_config_set_read_increment(&c, true);
channel_config_set_write_increment(&c, false);
dma_channel_configure(dma_chan, &c,
&uart_get_hw(uart0)->dr, /* Write to UART data register */
srcBuffer, /* Read from source buffer */
bufferLen,
true /* Start immediately */
);
0 Ohm 0.25W Carbon Film Resistor (Pack of 100)
Zero-ohm jumpers are essential on prototype PCBs for routing signals across layers — handy when laying out DMA-driven peripheral circuits.
Practical Applications for Indian Makers
DMA unlocks a whole category of projects that are otherwise CPU-bound. Here are real-world applications popular among Indian electronics hobbyists:
1. High-Speed Data Logger
Log accelerometer, gyroscope, or pressure sensor data at 1 kHz+ using SPI/DMA. The DMA fills a large RAM buffer while the CPU compresses or filters the data. Store to SD card via SDIO-DMA for sustained write speeds.
2. Audio Processing
Sample a microphone via ADC-DMA at 44.1 kHz, run FFT on the CPU while DMA fills the next buffer. Output processed audio to a DAC via DAC-DMA. No sample dropouts possible because DMA keeps both ends fed continuously.
3. WS2812 LED Control (Neopixels)
Drive hundreds of WS2812 RGB LEDs precisely timed via SPI-DMA or Timer-DMA. The CPU prepares the bit-stream in memory, then DMA transmits it with cycle-accurate timing — impossible to achieve reliably with bit-banging on a busy CPU.
4. OLED / TFT Display Refresh
Blast framebuffer data to SPI displays at full speed. An SSD1306 OLED or ILI9341 TFT refreshes much faster with SPI-DMA, making smooth animations achievable even on a modest MCU.
5. Motor Control with ADC Feedback
Sample current-sense ADC channels continuously via DMA for FOC (Field-Oriented Control) of BLDC motors. The DMA fills current samples at PWM frequency (typically 20–100 kHz), enabling tight control loops impossible with interrupt-driven ADC.
LM35 Temperature Sensor
A classic analog sensor perfect for DMA-based multi-channel ADC logging. Sample temperature from multiple LM35s simultaneously using DMA scan mode.
0.1/100nF Multilayer Ceramic Capacitor (Pack of 50)
Decoupling caps are critical near MCU power pins when using DMA — the burst bus activity causes supply spikes. Always place 100nF close to VCC/GND pins.
Frequently Asked Questions
Q1: Does using DMA make code more complicated?
Yes, DMA introduces complexity around buffer management, cache coherency (on Cortex-M7), and interrupt handling. However, libraries like STM32 HAL and Arduino SAMD abstract most of the complexity. Start with the HAL examples and then optimise.
Q2: Can DMA corrupt memory if configured wrong?
Absolutely. A wrong destination address can overwrite stack or code memory, causing hard faults. Always double-check address, transfer count, and data width. Use a debugger (ST-Link, J-Link) to single-step through DMA setup code.
Q3: What is cache coherency and why does it matter with DMA?
On Cortex-M7 MCUs (STM32H7, STM32F7) with data cache, DMA writes to RAM may not be visible to the CPU (which reads from cache). You must call SCB_InvalidateDCache_by_Addr() before reading DMA-filled buffers, or place DMA buffers in non-cacheable RAM regions.
Q4: Which is better — DMA or RTOS task for data transfer?
They solve different problems. DMA handles the physical data movement at hardware speed. An RTOS task handles software-level scheduling and synchronisation. For best performance, use DMA for transfers and an RTOS task (triggered by DMA interrupt via semaphore) for processing the data.
Q5: Is DMA available on ESP32?
Yes. ESP32 has DMA channels used internally by its SPI, I2S, UART, and ADC peripherals. The ESP-IDF spi_device_queue_trans() function uses DMA transparently. For custom DMA, the gdma (General DMA) API is available in ESP-IDF v4.4+.
Build Your Next Embedded Project with Zbotic
From MCU development boards to sensors, capacitors, and prototyping tools — Zbotic.in has everything Indian makers need. Free shipping available on orders above ₹999.
Add comment