Help with double buff QSPI DMA on ESP32 S3

Marian_M · April 29, 2025, 4:10pm

I use now this with single buff

/* Display flushing */
void my_disp_flush( lv_disp_drv_t *disp, const lv_area_t *area, lv_color_t *color_p )
{
  uint32_t w = ( area->x2 - area->x1 + 1 );
  uint32_t h = ( area->y2 - area->y1 + 1 );
#if (LV_COLOR_16_SWAP != 0)
    panel->getLcd()->drawBitmap(area->x1, area->y1, w, h, (uint8_t *)&color_p->full);
#else
#endif
//  lv_disp_flush_ready( disp );
}
IRAM_ATTR bool notify_lvgl_flush_ready(void *user_ctx)
{
    lv_disp_drv_t *disp_driver = (lv_disp_drv_t *)user_ctx;
    lv_disp_flush_ready(disp_driver);
    return false;
}

how this optimal switch to use double buff, and is partial 2 buff better ?

LVGL 8.3.7 display 360x360 ST77916

lv_disp_draw_buf_init( &draw_buf, buf, NULL, screenWidth * 40 );

kdschlosser · April 29, 2025, 4:45pm

It is only beneficial to use double buffering if the buffers are able to be allocated in DMA memory space. When transmitting a frame buffer to the display it is the CPU that normally handles doing this. it is a blocking call which means nothing else is able to be done while the buffer is being transmitted. With DMA memory use that call is no longer blocking because the CPU is not used to transmit the buffer to the display. This means that other work is able to be done like rendering the next bunch of data to be sent to the display. Hence the need for a second frame buffer.

LVGL has a function that needs to be called in order to let it know when a buffer has finished transmitting. when using a single frame buffer that is not allocated in DMA memory this function gets called from inside of the flush function. When DMA memory is used you need to have the function called after the buffer is actually sent and since writing the data to the display is no longer a blocking call you are not able to call the flush ready function from inside of the flush callback. Instead you need to register a callback function with the display driver that gets called when the buffer has finished being transmitted and that is where you call the flush ready function from.

Things that MUST be done in order to set up using double buffering with DMA memory…

There MUST be 2 frame buffers.
The frame buffers MUST be allocated in DMA memory. This is done by using heap_caps_calloc function with the MALLOC_CAP_DMA flag set. You will usually direct where the allocation should occur depending on the side of the buffer. To do that you use the MALLOC_CAP_SPIRAM or MALLOC_CAP_INTERNAL flags. Internal RAM is faster than SPIRAM but there is a limited amount of DMA’able internal RAM available so you may not be able to fit the buffers in DMA memory in the internal ram depending on the display being used. I believe that only the ESP32-S3 and ESP32-P4 are able to allocate DMA memory in SPIRAM

You MUST register a transaction complete callback for the display driver. As I said above you will register that callback and from that callback you call lv_flush_ready…

Everything else is the same. Those are the things that needs to be done. If you are using an RGB display it gets more complex to set it up. If you are using an I8080 or SPI display then you create the framebuffers so they have a size of 1/10th the total bytes for the whole display… width * height * bytes_per_pixel / 10

kdschlosser · April 29, 2025, 4:48pm

Marian_M · April 29, 2025, 4:52pm

Hmm i mean i get all info QSPI DMA is now used buff is 1/8 in internal ram and i ask how switch to double in showed code and how performance advantage i can get. Simply i add next one internal 1/8 buf and next ?

kdschlosser · April 29, 2025, 11:48pm

In that link are code examples of how to go about getting double buffering with DMA running.

mmar22 · April 30, 2025, 7:08am

Try make it clean. Example is chaos. And no change only add buff is for my mind no help.
Idea:
First flush buf1 → pass to DMA no set ready
then lvgl dont use MCU to render buf2 or?
Wait for DMA complete cb → set ready
lvgl render buf2 start

I dont see an improvement.
Idea2
lvgl render buf1 call flush and start render buf2 at once
then DMA send and render in paralel after this still wait for ready ?
improvement is based on what is longer

kdschlosser · April 30, 2025, 7:47am

if you attach your source code I can modify that to work. it’s easier to do that than it is to try and explain what needs to be done.

You can attach the .c file by dragging and dropping on a new message in the forum.

kdschlosser · April 30, 2025, 7:53am

the lv_flush_ready function is only to tell LVGL that it doesn’t need to wait to call the flush callback. If the buffer that is transmitting has not finished calling the flush callback is going to cause anomalies to be seen on your display. LVGL must wait until the buffer that is transmitting finishes transmitting before passing the next buffer to the flush function.

The reason why is because the SPI driver is going to queue that second buffer for transmit due to it being a non blocking call. then LVGL is going to start to render to the other buffer which is still being sent so data corruption will occur.

mmar22 · April 30, 2025, 8:50am

Code is on first post and work ok with one buf. You still dont reply to questions.
A. how change this code for two buf use and get more FPS
B. If then how more FPS we get

kdschlosser · April 30, 2025, 5:48pm

I did answer one of the questions see above. the second question I am not going to be able to answer because I do not know all of the exact specifics of your MCU and display setup and the code that is running. There are simply too many factors involved to be able to tell you what the FPS would be.

The code you provided is not the complete code and cannot be used.

Marian_M · April 30, 2025, 6:20pm

you mean this as reply?
Ok seems v8 is old and i understand , that you dont rememmber how double partial buffer in your code work. Hope no german friend is partly …

I define my teory not know if right. Mean we anime full screen for example fade somethink one image ARC and over it mask A4 image for create lens efect.

Calculation 1.buf
data for one frame is 360x360x2 image for arc indicator. next 360x360 A4 mask image. Animation start.
Flush cb obtain first part 360x40 pixels and set transaction to DMA
and here you still dont reply how ocurs when ready isnt call.

For ESP32 S3 read flash for example is 80MHz then one part for 360x40x2 is read in 360us next A4 require read and convert i mean less data read but more write ops aprox 500us sum render 860us for first buf.

full screen repeat this 9x then render time is 7,74ms. Transfer to LCD is QSPI 80M then byte 40M screen 6,48ms with single buff and wait one frame 7,74+6,48 = 14,22ms

Maybe i miss somethink , but still dont have info how real work two buffs.