I’m working on a display project with LVGL 9.4 on an ESP32-S3, and I’d like to get some feedback from the community on the correct way to handle double buffering and asynchronous drawing.
One frame buffer is currently being displayed by the RGB peripheral, while the other buffer is used for rendering.
Now my question is about LVGL’s drawing model:
– I want LVGL to render asynchronously into the buffer that is NOT currently being displayed
– LVGL itself also supports double buffering
– However, LVGL still calls the flush_cb, where pixel data is copied again into the display buffer
This makes me unsure about the correct architecture:
– Should LVGL’s draw buffer be directly mapped to the RGB back buffer?
– Is it possible (or recommended) to fully avoid copying in flush_cb and just swap buffers?
– How do you properly synchronize LVGL rendering with the RGB panel’s VSYNC / buffer swap?
At the moment everything works visually (colors, sizes, performance are fine), but I want to make sure I’m using LVGL and the ESP-IDF RGB driver in the intended and most efficient way.
I can share code snippets if needed.
What am I missing?
Thanks in advance for any insights or best practices!
Cheers
HaenZ
I’m not trying to make LVGL render unsynchronized with the display scanout. By “asynchronous” I mean: LVGL should render into the back buffer while the RGB peripheral is scanning out the front buffer, and once rendering is finished, the buffers are swapped in a controlled, tear-free way (e.g. at VSYNC).
In other words:
– No mid-frame updates
– No tearing or ghosting
– Rendering and scanout happen in parallel, but buffer ownership is synchronized
My uncertainty is mainly about LVGL’s flush model in this setup. LVGL already supports double buffering internally, but it still expects to call flush_cb, where pixels are typically copied into a display buffer. With the ESP-IDF RGB driver already running in double-buffered mode, this feels redundant.
What I’m trying to understand is:
– Whether LVGL’s draw buffer can be the RGB driver’s back buffer directly
– Whether flush_cb can be reduced to a “render done / swap requested” signal instead of a memcpy
– How to correctly synchronize the LVGL flush completion with the RGB panel’s VSYNC buffer swap to avoid tearing
Everything works visually right now, but I want to make sure the architecture is correct and that I’m not fighting LVGL’s intended rendering model.
If the correct answer is “LVGL expects ownership of the final framebuffer and copying is unavoidable”, that’s fine — I just want to be sure that’s actually the case with RGB panels on ESP32-S3.
The RGB driver is tricky because RGB displays do not have any GRAM internally. So a constant stream of data needs to be sent to the display. There is no callback function available that only gets called a single time once the buffer transmits the first time. With the RGB display you have to use the vsync callback function and that function gets called every single time the buffer data is sent even if the same buffer is being sent back to back. You need to build in some logic to be able to keep track of the buffer that is being sent and the one that is being rendered to. What make it a pain is there is no way for you to be able to identify what buffer has finished transmitting from inside of the vsync callback. and the structure that holds that information is declared and defined in the C source file and not in a header file. Thankfully it’s a pointer that is being passed to the callback function so all that needs to be done is you need to declare/define the same structure in your C source file where the callback function is located. Now you can access the proper location in memory for where the index is for the buffer that has just finished transmitting and by keeping a record of what buffer has finished transmitting you will know when a new buffer has just finished so you can call lv_flush_ready only if the index number you have stored is different from the one that has just finished sending. If you make more than one call to lv_flush_ready for the same buffer it can and usually will trip the marker for the buffer that has not actually finished transmitting and LVGL will end up writing to the buffer which will cause data corruption because the buffer is also being sent at the same time.
I have code I wrote that is hands down the fastest RGB driver that has been written for the ESP32 line of MCU’s. It allows LVGL to use partial buffers to render to which makes LVGL’s work a lot less and I keep the data in the buffers in sync using the second core which is also where the data from the partial buffers gets copied to a full buffer. It also handles rotation at the same time the copying it being done so you get 2 birds with one stone on that. There is one thing that does happen is that a bottle neck shows up in the SPI bus for the flash and PSRAM. This occurs because they share the same bus and when you have 2 cores reading and writing from flash and ram and at the same time DMA is reading from the ram as well to transmit the buffer data the 8 lane SPI has a hard time keeping up. Lucky for us Espressif has added some settings that relaxes that issue somewhat. The first one is getting a board that has 16mb or 32mb of Octal flash. This is important because you want both the flash and the PSRAM to be Octal this way you can overclock the SPI bus which will give you an effective clock of 240mHz vs the 80mHz default.There is another setting that you can turn on that will copy your entire program into PSRAM when the ESP32 boots. The benefit of doing this is not you don’t have to share the bus with the flash memory when running your code. the full tilt speed of the bus going to the PSRAM. The only time the bus would be shared if you need to access any files that you have stored, data files r images like PNG files you might be using.
Here is the link to the code for the driver I wrote. you will want to pull the information from these 2 areas…
The driver is written for use with MicroPython but you can pull out what is needed to get a standalone driver working.
as you already know the RGB driver in the ESP-IDF allocates 2 full frame buffers. You do not want to allocate 2 more for LVGL to render to. The reason why is because if you pass one of the LVGL buffers to the driver it ends up copying the entire buffer to the buffer that is has allocated and is idle. All of this happens on the same core LVGL is running on so it really hammers performance. You can collect the buffers the driver allocates and those can be passed directly to LVGL but the issue there is LVGL is going to copy the data from one buffer to the next to keep the buffers in sync and that takes a lot of time to do. It is better to offload that work to the second core which is what is being done in the driver I linked to above.
If I understand your approach correctly, your code still uses LVGL in the “standard” way, but it takes over the entire double-buffer management itself and distributes the work across both cores. That is a very solid design choice.
Unfortunately I’m not very comfortable with Python, but I already had the code translated to ESP-IDF C and still need to test it. Conceptually, though, the approach makes a lot of sense, especially to offload expensive copy operations from LVGL.
One additional observation that might be interesting in this context:
I found that draw_bitmap() in the ESP-IDF RGB controller effectively only swaps the buffer pointer. This is easy to verify with two (or even three) fully populated full-color framebuffers. From a hardware perspective I can reach the full pixel clock, so in theory ~60 fps on my panel.
With that in mind, I’m wondering whether part of the partial-buffer copy overhead could be avoided by letting LVGL render directly into the inactive RGB back buffer. Obviously both full framebuffers would need to be kept progressively consistent to avoid ghosting, and a pure LVGL full-render mode would likely be too slow.
This is just a thought at this point, but it might point toward a useful hybrid approach.
Thanks again for the detailed explanation and for sharing insights into your driver.
I’ll let you know
what you have to remember is if you have a 16 lane RGB display and then you have 2 buffers both of which are in DMA memory. That means that transmitting the buffer is not a blocking call so while one buffer is being transmitted then LVGL is able to render to the next buffer. the connection to the PSRAM is only 8 lane SPI. so right there in and of itself makes it impossible to achieve anywhere over a 40Mhz clock with the display. That’s a theoretical number and it is actually going to be lower than that because of transmitting overhead. But lets call it 40mHz. Then you toss in there reading from one buffer while rendering to the other. so now you have 2 things used the SPIRAM. that cuts the SPI speed in half. so your actual SPIRAM starting point is 40mHz because it is split between read and write operations. so now you have 8 lanes at 40mHz feeding 16 lanes for the display so your effective clock ends up being 20mHz right out of the gate. That is a best case scenario. If there is anything running on the second core like WiFi and you are connected and receiving data that is going to be another hit. If code is being loaded from flash memory that is another hit because the flash and spiram share the same bus.
It’s very easy to overload the SPI bus for the flash and SPIRAM. That is where the bottle neck is going to be 100% of the time when using an RGB display with 16 lane connected.
The reason why you don’t want LVGL rendering directly to the buffers is because the buffers have to stay in sync. so LVGL doesn’t render the same data to only a single buffer. It had to render it to 2 buffers because both buffer need to have identical data prior to new data being written to one of them. That is a giant performance hit.
Using the second core to handle the “full” frame buffers partial buffers is what gets used for LVGL. so LVGL only has to render a single time. The partial buffers are also small enough to be allocated in internal memory how fast LVGL is able to rendered is nit hindered by any bottle neck with the SPIRAM bus.
The cool thing is when using the second core to handle the transmitting of the buffers as soon as the swap of the pointers takes place the data that is being transmitted also gets copied to the buffer that was just swapped out. I can read from a transmitting buffer without any worry of data corruption. once that copying has finished then any partial buffers that lvgl has rendered to can be copied to the now fully synced idle buffer. Because LVGL is able to render to 2 partial buffers a stall never occurs because by time that second partial buffer has been rendered to the sync has finished and the waiting partial has been copied.
Another thing that i built into the driver is when the swapping of the the full buffers takes place. It doesn’t happen with each partial buffer that gets rendered to. If there is enough information that is being rendered where it spans multiple partial buffers the swap of the full buffers only takes place after the final partial buffer has been copied. This completely removes any possibility of taring occurring.
@kdschlosser
This really looks like the end-game solution for RGB panels on the ESP32. It’s by far the most complete and technically consistent explanation I’ve seen on this topic.
I don’t fully “see through” all of LVGL’s internal mechanics yet, but your description aligns exactly with the constraints I kept running into without being able to properly articulate them. At this point, I’m comfortable accepting that understanding every LVGL detail isn’t strictly necessary if the surrounding architecture is sound.
I’ll need a couple of days to integrate and test your code on my setup. I’m currently using lvgl-port, which works reasonably well, but clearly leaves performance on the table compared to what you describe.
Once I have this running and validated, would you be okay with me sharing your overall approach with the wider community (with proper attribution, of course)?
Thanks again for taking the time to explain this in such depth.