LVGL 8 on a 800x480 parallel RGB display on STM32F429I w/SDRAM and LTDC/DMA2D: best practices?

timecop · August 20, 2022, 1:26pm

I’m wondering what is the best practice for “nice” performance on a large-ish screen parallel LCD which doesn’t have it’s own VRAM.

Right now I have STM32F429I with 256Mb SDRAM and LTDC (parallel RGB interface, 565) and I’m using LVGL 8.x as following:

Init SDRAM
Init LTDC
Create 2 LTDC layers, both full-screen
Layer 1 framebuffer @ start sdram, layer 2 framebuffer after layer 1.
When registering lv display driver, I set direct_mode and full_refresh to 1.
In flush_cb, I check if color_p points to 1st or 2nd framebuffer, disable / enable appropriate layers, and configure LTDC to refresh on next vsync.
I also wait for actual reconfiguration to happen by polling for LTDC_FLAG_RR. I’ve tried doing this in IRQ and just while () inside flush_cb and performance was about same.
I have SPI-based touchscreen, but I think that stuff is working as good as it can, so there’s not much to optimize there.

I am not using RTOS, just a main loop with some stuff happening in IRQs.

I have a Systick interrupt which calls lv_tick_inc() every 1ms, and lv_timer_handler every 5 ms.

Now, while setting all this up I’ve seen lots of examples of people doing stuff with buffers in SRAM, DMA2D and all that jazz. Before I go and spend a bunch of time setting that up, I’m wondering what IS the actual best practice for this?

Is 2 framebuffers in SDRAM bad because drawing + refresh both have to go through the (limited) AHB bus to access external ram? Should I have a small buffer in SRAM and DMA2D it over? Will performance actually improve a lot?

What about caching images (icons, bitmaps etc) for drawing, also in SDRAM? Would that result in triple copies on each redraw?

image (sdram) → lvgl composite/draw/whatever → back to sdram → vsync → LTDC reads it out on refresh.

I’ve run the benchmark demo, and pretty much every metric is showing 32-34fps or so, while the screen feels kinda chunky at times (doesn’t really look like 30fps, more like 15 or so).

Anyone setup similar kind of stuff and can comment whether I should bother researching into DMA2D and such or is this the best it’s going to get given the limited memory bandwidth of STM32 SDRAM controller?

Thanks.

geert-KLA-BE · August 26, 2022, 7:35am

Hi,

I am using the same processor with a higher resolution display (1024x600). I don’t get it to run buttery smooth but good enough. For that I believe I will need at least an F7 or even H7 MCU.

I don’t know if the solution I use is the best one, but at least the best one that I found.
Since access to the SDRAM is slow (high latencies), I recommend that you use 2 draw buffers in internal RAM. LVGL will do the heavy calculation work there. That does do a lot for performance.

Then copy in the lvgl flush callback with DMA2D to a SDRAM screen buffer (DMA2D can copy rectangular memory blocks). Don’t wait for the DMA to finish in the flush callback but yield control back to the lvgl loop so it can draw in the background again. Using polling or an interrupt handler you can check when the DMA2D is ready and call the lvgl flush done function.

Configure the LTDC in 1 layer so it will communicate to the display from the SDRAM screen buffer. You will get some tearing when lvgl is drawing. I have not tried double buffering yet in combination with draw buffer usage.

I have noticed that you need to be careful with the SDRAM usage. If the LTDC has to wait for other accesses you will get screen distortions. So you might need to configure the DMA2D to copy with wait time enabled so the LTDC is not starved access to RAM for too long.

Since most images will not fit in internal RAM, you can read it only from the SDRAM or from (external) flash if it is not compressed. So that will cause extra latencies when drawing.

timecop · August 26, 2022, 10:08am

Hey, thanks for the response. Yeah, your way (2 smaller buffers in SRAM) is what I ended up doing after I at first did two framebuffers in SDRAM with 2 layers A/B’ing and that became very slow very quickly. The other un-slowness fix was to have lv_port_tick and lv_timer_handler running in different contexts (tick in Systick, and timer in main loop).