Bare Metal STM32 · Part 4 of 7

SPI DMA Transfers: No CPU Overhead

STM32 SPI DMA Intermediate Source Code
Bare Metal STM32 Part 4
Bare Metal STM32 — Part 4
SPI DMA Transfers: No CPU Overhead
33:05
Part 4 / 7
33:05
Intermediate
SPI1 · DMA2
01

Overview

Polling SPI burns CPU cycles waiting for the TX buffer to empty — fine for a few bytes, catastrophic for a display frame or sensor burst. DMA transfers hand the job to dedicated hardware: the CPU starts the transfer, does other work, and gets an interrupt when it's done. This part sets up SPI1 + DMA2 from scratch to send a 256-byte frame at 42 MHz with zero CPU involvement during the transfer.

  • Configure SPI1 as master: clock polarity, phase, baud rate, data size
  • Set up DMA2 Stream 3 Channel 3 (SPI1_TX) with memory-to-peripheral mode
  • Handle the transfer-complete interrupt and drive CS manually for frame accuracy
  • Understand why you must wait for BSY=0 before deasserting CS
02

DMA Concepts

🌊
Streams & Channels
STM32F4 has DMA1 and DMA2, each with 8 streams. Each stream has 8 channel options. Which stream/channel routes to which peripheral is fixed in the datasheet — always check the DMA request mapping table.
DMA2 Stream3 Ch3 = SPI1_TX
🔁
Transfer Modes
Normal mode: DMA stops after NDTR transfers and fires a TC interrupt. Circular mode: DMA auto-reloads NDTR and fires HT (half) and TC (full) interrupts continuously — used in Part 5 for ADC.
CIRC bit in DMA_SxCR
FIFO & Burst
The DMA FIFO buffers data between bus transactions. With FIFO enabled, bursts of 4/8/16 beats can saturate the AHB bus more efficiently. Direct mode (FIFO off) is simpler and works for most use cases.
DMA_SxFCR: DMDIS, FTH
03

Code

01
SPI1 + DMA2 init — master, 8-bit, CPOL=0 CPHA=0
spi_dma.c
C
void SPI1_DMA_Init(void) {
    /* Clocks */
    RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_DMA2EN;
    RCC->APB2ENR |= RCC_APB2ENR_SPI1EN;

    /* PA5=SCK, PA6=MISO, PA7=MOSI → AF5 */
    GPIOA->MODER  |= (0xAA << 10);   // Alternate function
    GPIOA->OSPEEDR|= (0xFF << 10);   // High speed
    GPIOA->AFR[0] |= (0x555 << 20); // AF5 for PA5/6/7

    /* SPI1: Master, 8-bit, fPCLK/4 (21 MHz), CPOL=0, CPHA=0, SSM */
    SPI1->CR1 = SPI_CR1_MSTR            // Master mode
              | SPI_CR1_BR_0             // Baud = fPCLK/4
              | SPI_CR1_SSM              // Software slave management
              | SPI_CR1_SSI;             // Internal slave select high
    SPI1->CR2 = SPI_CR2_TXDMAEN;        // Enable TX DMA request

    /* DMA2 Stream3 Channel3 — SPI1_TX (see RM Table 43) */
    DMA2_Stream3->CR = (3 << DMA_SxCR_CHSEL_Pos)  // Channel 3
                     | DMA_SxCR_DIR_0               // Mem → Peripheral
                     | DMA_SxCR_MINC                // Memory auto-increment
                     | DMA_SxCR_TCIE;               // Transfer-complete IRQ
    DMA2_Stream3->PAR = (uint32_t)&SPI1->DR;      // Peripheral = SPI DR

    NVIC_SetPriority(DMA2_Stream3_IRQn, 6);
    NVIC_EnableIRQ(DMA2_Stream3_IRQn);

    SPI1->CR1 |= SPI_CR1_SPE;           // Enable SPI
}

/* Kick off a DMA transfer — CS must be asserted by caller */
void SPI1_DMA_Send(const uint8_t *buf, uint16_t len) {
    DMA2_Stream3->NDTR = len;
    DMA2_Stream3->M0AR = (uint32_t)buf;
    DMA2_Stream3->CR  |= DMA_SxCR_EN;  // Start
}

void DMA2_Stream3_IRQHandler(void) {
    if (DMA2->LISR & DMA_LISR_TCIF3) {
        DMA2->LIFCR = DMA_LIFCR_CTCIF3;  // Clear flag
        // Wait for SPI shift register to finish, THEN deassert CS
        while (SPI1->SR & SPI_SR_BSY);
        CS_Deassert();
    }
}
💡
Always wait for BSY=0 before releasing CS
The DMA TC interrupt fires when the last byte is written to the SPI TX FIFO — not when it's been shifted out on the wire. Check SPI_SR_BSY before asserting CS high, or the last byte will be cut short.

Continue the Series

Work through all 7 parts of Bare Metal STM32 to master low-level embedded development from the ground up.