Hello,
I am struggling with the STM32 DMA transfer limit (64KB) when using LVGL 8.4 on an STM32H743 with a QSPI display (CO5300).
What do you want to achieve?
I want to increase the LVGL display buffer size beyond 32767 pixels (65534 bytes at 16-bit color depth) to achieve higher FPS. Currently, my RAM allows for much larger buffers (512KB available), but I am hitting a wall with DMA constraints.
What have you tried so far?
I implemented a “chunking” mechanism to split the transfer into segments smaller than 64KB.
Important observation: When I set LVGL_BUFFER_SIZE to 32767 or less, everything works perfectly. The image is clear, and the communication is stable. However, the FPS is low.
When I increase the buffer size above 32767, the program hangs in the LVGL flushing loop:
while(draw_buf->flushing) {
if(disp_refr->driver->wait_cb) disp_refr->driver->wait_cb(disp_refr->driver);
}
This happens even with my chunking logic, suggesting that lv_disp_flush_ready() is not being called correctly during multi-part transfers.
Code to reproduce
#define DMA_MAX_BYTES 65535// Multiplier of pixel size
#define LVGL_BUFFER_SIZE MY_DISP_HOR_RES * (MY_DISP_VER_RES/10) // max 32767
static uint8_t *qspi_buf_ptr;
static volatile uint32_t qspi_bytes_left;
static lv_disp_draw_buf_t draw_buf_dsc_2;
__attribute__((section(".lvgl")))
static lv_color_t buf_2_1[LVGL_BUFFER_SIZE];
__attribute__((section(".lvgl")))
static lv_color_t buf_2_2[LVGL_BUFFER_SIZE];
lv_disp_draw_buf_init(&draw_buf_dsc_2, buf_2_1, buf_2_2, LVGL_BUFFER_SIZE);
disp_drv.draw_buf = &draw_buf_dsc_2;
static void qspi_start_next_chunk(void)
{
if (qspi_bytes_left == 0) {
lv_disp_flush_ready(&disp_drv);
return;
}
uint32_t chunk = (qspi_bytes_left > DMA_MAX_BYTES) ? DMA_MAX_BYTES : qspi_bytes_left;
// Custom QSPI write function: 0x2C is RAMWR (Memory Write)
// It triggers HAL_QSPI_Transmit_DMA
qspi_114_cmd_write(0x2C, qspi_buf_ptr, chunk);
}
static void disp_flush(lv_disp_drv_t * disp_drv, const lv_area_t * area, lv_color_t * px_map)
{
const int x1 = area->x1;
const int x2 = area->x2;
const int y1 = area->y1 + 16;
const int y2 = area->y2 + 16;
uint32_t w = lv_area_get_width(area);
uint32_t h = lv_area_get_height(area);
uint32_t len = w * h * 2; // 16-bit depth
SCB_CleanDCache_by_Addr((uint32_t*)px_map, len);
// Set address window (CASET/PASET)
AMOLED_Begin_Bitmap(x1, y1, x2 + 1, y2 + 1);
qspi_buf_ptr = (uint8_t *)px_map;
qspi_bytes_left = len;
qspi_start_next_chunk();
}
void HAL_QSPI_TxCpltCallback(QSPI_HandleTypeDef *hqspi)
{
// Update pointers after successful chunk transfer
uint32_t sent = (qspi_bytes_left > DMA_MAX_BYTES) ? DMA_MAX_BYTES : qspi_bytes_left;
qspi_buf_ptr += sent;
qspi_bytes_left -= sent;
qspi_start_next_chunk();
}
Environment
- MCU/MPU/Board: custom board with STM32H743
- LVGL version: 8.4
- Display: AMOLED CO5300 (QSPI)
- MDMA Configuration:
hmdma_quadspi_fifo_th.Instance = MDMA_Channel0;
hmdma_quadspi_fifo_th.Init.Request = MDMA_REQUEST_QUADSPI_FIFO_TH;
hmdma_quadspi_fifo_th.Init.TransferTriggerMode = MDMA_REPEAT_BLOCK_TRANSFER;
hmdma_quadspi_fifo_th.Init.Priority = MDMA_PRIORITY_VERY_HIGH;
hmdma_quadspi_fifo_th.Init.Endianness = MDMA_LITTLE_ENDIANNESS_PRESERVE;
hmdma_quadspi_fifo_th.Init.SourceInc = MDMA_SRC_INC_HALFWORD;
hmdma_quadspi_fifo_th.Init.DestinationInc = MDMA_DEST_INC_DISABLE;
hmdma_quadspi_fifo_th.Init.SourceDataSize = MDMA_SRC_DATASIZE_HALFWORD;
hmdma_quadspi_fifo_th.Init.DestDataSize = MDMA_DEST_DATASIZE_HALFWORD;
hmdma_quadspi_fifo_th.Init.DataAlignment = MDMA_DATAALIGN_PACKENABLE;
hmdma_quadspi_fifo_th.Init.BufferTransferLength = 32;
My question is: Am I fundamentally limited by the DMA 64KB transfer size per “request”, or is there a more efficient way to feed the QSPI/DMA with larger buffers in LVGL 8.4 to gain more FPS? Is my chunking logic missing a race condition or an LVGL-specific requirement?