[STM32H7] How to handle DMA transfers > 64KB in disp_flush?


Hello,

I am struggling with the STM32 DMA transfer limit (64KB) when using LVGL 8.4 on an STM32H743 with a QSPI display (CO5300).

What do you want to achieve?

I want to increase the LVGL display buffer size beyond 32767 pixels (65534 bytes at 16-bit color depth) to achieve higher FPS. Currently, my RAM allows for much larger buffers (512KB available), but I am hitting a wall with DMA constraints.

What have you tried so far?

I implemented a “chunking” mechanism to split the transfer into segments smaller than 64KB.

Important observation: When I set LVGL_BUFFER_SIZE to 32767 or less, everything works perfectly. The image is clear, and the communication is stable. However, the FPS is low.

When I increase the buffer size above 32767, the program hangs in the LVGL flushing loop:

while(draw_buf->flushing) {
    if(disp_refr->driver->wait_cb) disp_refr->driver->wait_cb(disp_refr->driver);
}

This happens even with my chunking logic, suggesting that lv_disp_flush_ready() is not being called correctly during multi-part transfers.

Code to reproduce

#define DMA_MAX_BYTES 65535// Multiplier of pixel size
#define LVGL_BUFFER_SIZE MY_DISP_HOR_RES * (MY_DISP_VER_RES/10) // max 32767
static uint8_t *qspi_buf_ptr;
static volatile uint32_t qspi_bytes_left;

    static lv_disp_draw_buf_t draw_buf_dsc_2;
    __attribute__((section(".lvgl")))
    static lv_color_t buf_2_1[LVGL_BUFFER_SIZE];
    __attribute__((section(".lvgl")))                  
    static lv_color_t buf_2_2[LVGL_BUFFER_SIZE];                          
    lv_disp_draw_buf_init(&draw_buf_dsc_2, buf_2_1, buf_2_2, LVGL_BUFFER_SIZE);  

    disp_drv.draw_buf = &draw_buf_dsc_2;

static void qspi_start_next_chunk(void)
{
    if (qspi_bytes_left == 0) {
        lv_disp_flush_ready(&disp_drv);
        return;
    }

    uint32_t chunk = (qspi_bytes_left > DMA_MAX_BYTES) ? DMA_MAX_BYTES : qspi_bytes_left;
    
    // Custom QSPI write function: 0x2C is RAMWR (Memory Write)
    // It triggers HAL_QSPI_Transmit_DMA
    qspi_114_cmd_write(0x2C, qspi_buf_ptr, chunk); 
}

static void disp_flush(lv_disp_drv_t * disp_drv, const lv_area_t * area, lv_color_t * px_map)
{
    const int x1 = area->x1;
    const int x2 = area->x2;
    const int y1 = area->y1 + 16;
    const int y2 = area->y2 + 16;

    uint32_t w = lv_area_get_width(area);
    uint32_t h = lv_area_get_height(area);
    uint32_t len = w * h * 2; // 16-bit depth

    SCB_CleanDCache_by_Addr((uint32_t*)px_map, len);

    // Set address window (CASET/PASET)
    AMOLED_Begin_Bitmap(x1, y1, x2 + 1, y2 + 1);

    qspi_buf_ptr = (uint8_t *)px_map;
    qspi_bytes_left = len;

    qspi_start_next_chunk();
}

void HAL_QSPI_TxCpltCallback(QSPI_HandleTypeDef *hqspi)
{
    // Update pointers after successful chunk transfer
    uint32_t sent = (qspi_bytes_left > DMA_MAX_BYTES) ? DMA_MAX_BYTES : qspi_bytes_left;

    qspi_buf_ptr += sent;
    qspi_bytes_left -= sent;

    qspi_start_next_chunk();
}

Environment

  • MCU/MPU/Board: custom board with STM32H743
  • LVGL version: 8.4
  • Display: AMOLED CO5300 (QSPI)
  • MDMA Configuration:
hmdma_quadspi_fifo_th.Instance = MDMA_Channel0;
hmdma_quadspi_fifo_th.Init.Request = MDMA_REQUEST_QUADSPI_FIFO_TH;
hmdma_quadspi_fifo_th.Init.TransferTriggerMode = MDMA_REPEAT_BLOCK_TRANSFER;
hmdma_quadspi_fifo_th.Init.Priority = MDMA_PRIORITY_VERY_HIGH;
hmdma_quadspi_fifo_th.Init.Endianness = MDMA_LITTLE_ENDIANNESS_PRESERVE;
hmdma_quadspi_fifo_th.Init.SourceInc = MDMA_SRC_INC_HALFWORD;
hmdma_quadspi_fifo_th.Init.DestinationInc = MDMA_DEST_INC_DISABLE;
hmdma_quadspi_fifo_th.Init.SourceDataSize = MDMA_SRC_DATASIZE_HALFWORD;
hmdma_quadspi_fifo_th.Init.DestDataSize = MDMA_DEST_DATASIZE_HALFWORD;
hmdma_quadspi_fifo_th.Init.DataAlignment = MDMA_DATAALIGN_PACKENABLE;
hmdma_quadspi_fifo_th.Init.BufferTransferLength = 32;

My question is: Am I fundamentally limited by the DMA 64KB transfer size per “request”, or is there a more efficient way to feed the QSPI/DMA with larger buffers in LVGL 8.4 to gain more FPS? Is my chunking logic missing a race condition or an LVGL-specific requirement?