ESP32 320x480 low FPS animating

Marian_M · May 18, 2023, 4:29pm

Full screen animated with more levels result into 4FPS
Howto optimize or speed it up.

Replicate create double width images 640x480 and set animation move left.
Try one image and two images with blending plus one object without move over on top level.

kdschlosser · May 18, 2023, 8:53pm

you need a display that has an i8080 interface using 16 lanes and even then that’s a lot of data that is being transferred. DMA double buffer is a MUST. I suggest using an MCU that has a boat load of SRAM this way the DMA buffers will fit onto the faster ram instead of using SPIRAM which is way slower. Optionally you can use a more advanced MCU that has an even faster way to communicate with a display.

If you look at it this way with a 320 x 480 display even using only 16bit color you are looking at 2,457,600 bits that have to be transferred for each frame. 3 or 4 wire SPI running at 40Mhz has a maximum speed of 40mbps. take that down by 25% because of overhead and you now have 30mbps. that’s 12FPS right there. If you don’t have 614,400 bytes of memory in SRAM then it is going to be crazy slow. Also the animation that is running is it an image? is the image loaded into ram? if so what kind of ram? Do you have the color depth of the image set the same as the color depth of the display?

the I8080 interface is 10 times faster than SPI so you should be able to get close to 40FPS but you are still at the mercy of how much DMA memory is available.

Marian_M · May 19, 2023, 5:03am

Yes good info , but now i use i80 8bit and of course CPU GHz and GB is solution , but not request.
Bottleneck isnt bus here, but lvgl rendering speed.

kdschlosser · May 19, 2023, 5:18am

I can tell you this for a fact. it’;s not LVGL. it’s your hardware specifically the RAM allocation. The ESP32 does not have enough SRAM to hold a whole screen frame buffer let alone 2 if you are using double buffer. which means any frame buffer is going to be stored in SPIRAM. SPIRAM = SPI-RAM and as the name suggests that ram is accessed over a really slow SPI bus. any read or write operations to that memory especially if it’s large block of information like a frame buffer is going to be incredibly slow.

How I know it’s a hardware issue and not LVGL is this.

480 x 320 display with 32 bit color, averaging some 230FPS.

Wondering what the difference is?
Hardware is the only difference.

kdschlosser · May 19, 2023, 5:25am

Then you add in there the loading of a 640x480 image that is more than likely 32 bit color depth. Loading that into ram as well which is going to take up a whopping 1,228,800 bytes. The pixel data from that image gets taken out of memory one pixel at a time, more than likely converted to RGB565 and then the converted pixel data gets put into the frame buffer.

Optimize your image so it has the same color depth as the display and that will save quite a bit of time in rendering but in the end the pixel data still needs to be moved from one location in slow memory to another location in slow memory and there is not going to be a way to overcome that with an ESP32. I hate to be the bringer of bad news but your program is well outside the bounds of what your hardware is capable of handling. Not the processor, it’s the memory that is giving you grief.

mmar22 · May 19, 2023, 6:30am

I dont agree. ESP PAR8Q driver now use 6,58MHz and lvgl config is partial buffering in internal RAM.
You like math then 320x480x2 is byte count = … 21FPS max with full screen refresh.
All this teory i know, but i dont have time to analyse lvgl rendering code.
My tip:

Buff is for example 40 lines x 320 px.
Code render to this buf background (i mean here is optimal 40 x 320
Code add next object image layer 1 (i number to top) Image is bigger as width for example 640x480 in qspi flash mapped as code. QSPI clk is 80MHz then data read is 40MHz i mean speed ok, but what do code to locate frame xanim to xanim+320 … is this optimal?
Code add next layer 2… same issue
Code add static layers.
Start block flush
Result now 4FPS
Rendering little objects animation i can arive >100FPS

kdschlosser · May 20, 2023, 3:51am

You are forgetting the pixel dataa that is loaded from the HUGE image is not stored in “fast ram” it is stored in SPI memory. since an ESP32 is being used and not an ESP32-S3 the memory is on an SPI bus. which is HORRIBLY slow. The pixel data has to be pulled from that memory one pixel at a time and then converted to a different color format. That takes a lot of time to do. Once again not an issue with LVGL with either of those causes for a slow down.

Your whole problem is more than likely due to the 640 x 480 x 32bit image you are loading into memory.

Without seeing the code I cannot tell you exactly what is going on. I can tell you that the ESP32 only has a very small amount of DMA memory available. the S3 has more but the majority of it is located once again in SPIRAM which is going to be slower. you are loading an image that is larger than what could fit into RAM and it MUST be loaded into SPIRAM.

and 320x480x2 = 307,200 and that is a single frame buffer and that would consume almost all of an ESP32’s conventionaal RAM. a second frame buffer would end up in SPIRAM. just to let you know.

You said it yourself.

That right there tells you that the problem is NOT LVGL. It is because of the image you are loading and that image is going to push things into that slower memory.

Out of curiosity have you measured the amount of time it takes for the ESP32 to write a frame to the display? Not using the built in Performance monitor in LVGL as that captures the entire process including rendering to the frame buffer… I am curious to see if there is a difference in the time it takes to write the data between using small things and that one large image.

Here you go, 480x320 display with an image that is 640x480.

not 4 frames per second. That is with a single frame buffer that is 480x320x4. What I am telling you is the problem is not with LVGL, it is going to be with the hardware or your code.

Marian_M · May 20, 2023, 6:08am

As i write inside process in lvgl is priority. You still speak about loading big image. I mean lvgl loading nothing, render make copy memory to memory. And format of image is same 16+ALPHA. QSPI flash is of course slower as ram , but for rendering isnt used SPIRAM. My buffer setup is 320 x 30 lines.
example def image part.

0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,0x10,0x82,0xFA,};
const lv_img_dsc_t ui_img_d01_png = {
   .header.always_zero = 0,
   .header.w = 740,
   .header.h = 120,
   .data_size = sizeof(ui_img_d01_png_data),
   .header.cf = LV_IMG_CF_TRUE_COLOR_ALPHA,
   .data = ui_img_d01_png_data};

kdschlosser · May 20, 2023, 6:51pm

LV_IMG_CF_TRUE_COLOR_ALPHA == 32 bit. True Color is 24 Bit + Alpha = 32 bit.

The image is being converted by LVGL so a direct memory copy is not being done. Your frame buffer is also pretty small. it is only 6.25% of the display area. That means to render the entire display LVGL is going to flush 16 different areas of the display. 16 times the flush function is going to need to be called when you move the screen so much as 1 pixel.

are you using an ESP32? or an ESP32-S3?

Marian_M · May 21, 2023, 8:02am

Used ESP32 and you miss Images — LVGL documentation

and ofcourse on rendering routine is pixel readed from flash , recalculated with alpha and background and writed into ram buffer only new two bytes.
I mean most important here is on big width images cant cache between flash and core handle read lines without recaching. My plan for monday test is split image into more segments where width pass cache size for one buff. And here i mean 30 lines is perfect.
Too overhead from 16 areas isnt problem .

EDIT: I read more about caching and seems fetch is only 32bytes then size and split images is not relevant. Too prefetch on ESP isnt offloaded then …?

Marian_M · June 14, 2023, 5:38pm

FYI i end with 18FPS for full animated lovyanGFX on two displays.
Only urgent in lovyanGFX DMA use is accept, that call gfx.pushImageDMA setup window for LCD only if previous x,y,w,h differs and setup only diff.
Result to trouble between switching LCD1 LCD2.
Workaround is send one dummy pixel if chip select toggles, that forcing send new window instruction into right LCD over DMA transactions.