ESP32, ili9488, 8 Bit Parallel Choppy animation when scrolling Tabs with LVGL generated objects

20210210_124741_1 (4)

Here is a video of me scrolling using this setup.

I have removed all other tasks in the loop besides:

lv_task_handler();
delay(5);

when i scroll a tab from left to right it is very choppy. I don’t see the CPU being used heavily.

I am not sure how I can activate DMA using Bodmers TFT_eSPI library. However I should be getting performance on a parallel display, right? I notice that the choppiness is less with the images in the background that are import .png files. Could this be a general LVGL setting?

What is your display buffer size?

I Tried to add multiple buffers but my esp32 seemed complain about ram size when compiling.

@embeddedt

lv_disp_buf_init(&disp_buf, buf, NULL, LV_HOR_RES_MAX * 10);

I have tried to change it here to a value above 10 but it keeps crashing.

is it possible to setup two buffers with DMA transfer on esp32 with 4MB flash?

You need to increase both the size of the buf array (should be above that function call) and the size passed to lv_disp_buf_init.

@embeddedt many thanks. So I did a bit of research and have setup my buffer in the following way:

#define LV_BUFFER_SIZE (480*30)
lv_disp_buf_t lvDisplayBuffer;
lv_color_t lvBuffer1[LV_BUFFER_SIZE];
lv_color_t lvBuffer2[LV_BUFFER_SIZE];

and then in setup()

// Initialize display buffer

lv_disp_buf_init(&lvDisplayBuffer, lvBuffer1, NULL, LV_BUFFER_SIZE);

I set this up so that I could use multiple buffers if required - I have disabled lvBuffer2 (for those of you you who are trying to setup multiple buffers).

Initially this would not compile and I got:

.dram0.bss' will not fit in region dram0_0_seg’

and

`dram0_0_seg’ overflowed by 9408 bytes

so i used some wisdom from this post and modified lv_conf.h.

define LV_MEM_SIZE (16U * 768U)

However I am still getting the same lag issue. Although the FPS now at idle is 100 and 8% CPU and when scrolling it drops to 12 FPS.

Could it be something else?

Hi Guys, any thoughts on this pls?

At this point it must be either a bottleneck with sending pixels to the display or rendering. I don’t know whether TFT_eSPI has a DMA option or not, but you could look into that.

What’s the CPU speed of your ESP32?

Do you happen to have an oscilloscope? If so, can look at the WR output to the TFT, it will give you information about the pixel transfer rate, the percentage of time spent on LVGL rendering vs buffer transfer to the TFT, etc.

Alternatively you can instrument your code yourself and collect the statistics.

BTW, your scrolling seems to touch a large number of pixels so lagginess is expected. You could double the transfer rate if you can switch the 16 bit TFT transfer mode rather than 8. Double buffering with DMA may also work (if the LVGL time to render is Tl and the time to send the pixels is Tp, then you will get the update time max(Tl, Tp) rather than Tl + Tp.

1 Like

@zapta and @embeddedt I don’t have an oscilloscope but do like the idea of setting up DMA. I already have two buffers how would I perform a DMA transfer on the ESP32?

ESP32-WROOM-32D is running at 240Mhz. I also have the option of using 8MB or 16MB versions if that would make any difference, but would require some soldering. based on the speed tests on bodmers demos I thought running 8 bit parallel would be more than enough for better FPS.

If I was to get the info from the oscilloscope then where would I apply the any adjustments? so far it seems like its number of buffers, buffer size or LV_DISP_DEF_REFR_PERIOD that I can change even though its only LV_DISP_DEF_REFR_PERIOD that seems to have any visible effect on screen behaviour.

240MHz should be more than enough; my STM32F7 is 200MHz and it can achieve 35-50fps consistently even when scrolling. It sounds like the bottleneck is in transferring the data to the display, meaning that changing the buffer size is unlikely to help.

I am not familiar with ESP32 drivers so hopefully someone else can explain how to use DMA there.

Thanks! So will this be specific to the ili9488 driver (bodmer TFT_eSPI), the parallel interface setup or the actual display (ER-TFT035-6 )?

It’s specific to the driver + display connection combination.

Does LVGL provide any test program or statistics collection to evaluate the rendering performance of new systems? For example, in memory time rendering, pixels transfer rate, etc.

The closest we have right now is https://github.com/lvgl/lv_examples/tree/master/src/lv_demo_benchmark, as well as the LV_USE_PERF_MONITOR setting which you can enable in lv_conf.h.

Thanks @embeddedt, I was not aware of these two options. You may want to mention the benchmark in the ‘porting’ section of the LVGL documentation so people can optimize their systems.

For me and I think for others here, the bottleneck of our LVGL implementations is that pixels transfer to the TFT. Hopefully once LVGL gain more momentum we will have on Aliexpress TFTs with controllers ‘optimized for LVGL’. :wink:

I am curious how other libraries manage to work around this problem, as this type of problem frequently seems to happen with LVGL on SPI but not other libraries. Are they simply doing less rendering computation, not just sending less pixels to the display? If the bottleneck was pixel transfer I would expect every library to have the same problem.

@embeddedt Would it be worth considering the adafruit drivers for the display instead?

I don’t know how to code in Esspressif idf (yet), would get better performance with the drivers available there?

As far as I know, the ESP32 does not support DMA on 8 bit parallel.
You will need fo move to other hardware to achieve that.
I can suggest that you try decreasing LV_DISP_DEF_REFR_PERIOD In lv_conf.h from 30ms to 15 or 10ms and you might see an improvement in refresh rate.

But I believe the only real way to fix this is use of DMA.

Also, if you haven’t already (assuming you have two cores) - run LVGL on one core and the display driver on the other core.

Thanks. I have already lowered the refresh rate and yes it vastly improves things. I will run on separate cores to see if that improves the performance.