ESP32, ili9488, 8 Bit Parallel Choppy animation when scrolling Tabs with LVGL generated objects

@embeddedt many thanks. So I did a bit of research and have setup my buffer in the following way:

#define LV_BUFFER_SIZE (480*30)
lv_disp_buf_t lvDisplayBuffer;
lv_color_t lvBuffer1[LV_BUFFER_SIZE];
lv_color_t lvBuffer2[LV_BUFFER_SIZE];

and then in setup()

// Initialize display buffer

lv_disp_buf_init(&lvDisplayBuffer, lvBuffer1, NULL, LV_BUFFER_SIZE);

I set this up so that I could use multiple buffers if required - I have disabled lvBuffer2 (for those of you you who are trying to setup multiple buffers).

Initially this would not compile and I got:

.dram0.bss' will not fit in region dram0_0_seg’

and

`dram0_0_seg’ overflowed by 9408 bytes

so i used some wisdom from this post and modified lv_conf.h.

define LV_MEM_SIZE (16U * 768U)

However I am still getting the same lag issue. Although the FPS now at idle is 100 and 8% CPU and when scrolling it drops to 12 FPS.

Could it be something else?

Hi Guys, any thoughts on this pls?

At this point it must be either a bottleneck with sending pixels to the display or rendering. I don’t know whether TFT_eSPI has a DMA option or not, but you could look into that.

What’s the CPU speed of your ESP32?

Do you happen to have an oscilloscope? If so, can look at the WR output to the TFT, it will give you information about the pixel transfer rate, the percentage of time spent on LVGL rendering vs buffer transfer to the TFT, etc.

Alternatively you can instrument your code yourself and collect the statistics.

BTW, your scrolling seems to touch a large number of pixels so lagginess is expected. You could double the transfer rate if you can switch the 16 bit TFT transfer mode rather than 8. Double buffering with DMA may also work (if the LVGL time to render is Tl and the time to send the pixels is Tp, then you will get the update time max(Tl, Tp) rather than Tl + Tp.

1 Like

@zapta and @embeddedt I don’t have an oscilloscope but do like the idea of setting up DMA. I already have two buffers how would I perform a DMA transfer on the ESP32?

ESP32-WROOM-32D is running at 240Mhz. I also have the option of using 8MB or 16MB versions if that would make any difference, but would require some soldering. based on the speed tests on bodmers demos I thought running 8 bit parallel would be more than enough for better FPS.

If I was to get the info from the oscilloscope then where would I apply the any adjustments? so far it seems like its number of buffers, buffer size or LV_DISP_DEF_REFR_PERIOD that I can change even though its only LV_DISP_DEF_REFR_PERIOD that seems to have any visible effect on screen behaviour.

240MHz should be more than enough; my STM32F7 is 200MHz and it can achieve 35-50fps consistently even when scrolling. It sounds like the bottleneck is in transferring the data to the display, meaning that changing the buffer size is unlikely to help.

I am not familiar with ESP32 drivers so hopefully someone else can explain how to use DMA there.

Thanks! So will this be specific to the ili9488 driver (bodmer TFT_eSPI), the parallel interface setup or the actual display (ER-TFT035-6 )?

It’s specific to the driver + display connection combination.

Does LVGL provide any test program or statistics collection to evaluate the rendering performance of new systems? For example, in memory time rendering, pixels transfer rate, etc.

The closest we have right now is https://github.com/lvgl/lv_examples/tree/master/src/lv_demo_benchmark, as well as the LV_USE_PERF_MONITOR setting which you can enable in lv_conf.h.

Thanks @embeddedt, I was not aware of these two options. You may want to mention the benchmark in the ‘porting’ section of the LVGL documentation so people can optimize their systems.

For me and I think for others here, the bottleneck of our LVGL implementations is that pixels transfer to the TFT. Hopefully once LVGL gain more momentum we will have on Aliexpress TFTs with controllers ‘optimized for LVGL’. :wink:

I am curious how other libraries manage to work around this problem, as this type of problem frequently seems to happen with LVGL on SPI but not other libraries. Are they simply doing less rendering computation, not just sending less pixels to the display? If the bottleneck was pixel transfer I would expect every library to have the same problem.

@embeddedt Would it be worth considering the adafruit drivers for the display instead?

I don’t know how to code in Esspressif idf (yet), would get better performance with the drivers available there?

As far as I know, the ESP32 does not support DMA on 8 bit parallel.
You will need fo move to other hardware to achieve that.
I can suggest that you try decreasing LV_DISP_DEF_REFR_PERIOD In lv_conf.h from 30ms to 15 or 10ms and you might see an improvement in refresh rate.

But I believe the only real way to fix this is use of DMA.

Also, if you haven’t already (assuming you have two cores) - run LVGL on one core and the display driver on the other core.

Thanks. I have already lowered the refresh rate and yes it vastly improves things. I will run on separate cores to see if that improves the performance.

@reso I am implementing with two cores now. Just wondering I can run the GUI in the loop under one task, how to I get TFT_eSPI running on the second core as it does not have any command in the loop?

i.e. how do you get TFT_eSPI and lvgl running in independent cores on the esp32?

Also @embeddedt do you know if you still can’t run tasks on both cores as per

If you aren’t deploying this code into a real product, I would give FreeRTOS tasks on multiple cores another try. From what I’ve seen, you find out very quickly if simultaneous cores are a problem, as things crash/don’t work. :wink:

LVGL tasks don’t run on multiple cores as they are really periodic timers, not tasks.

Just as a FYI, I have setup the buffers like this to get around the memory issue:

lv_color_t* buf1 = (lv_color_t*)malloc(DISP_BUF_SIZE * sizeof(lv_color_t));
    lv_color_t* buf2 = (lv_color_t*)malloc(DISP_BUF_SIZE * sizeof(lv_color_t));

    lv_init();

    /* Initialize SPI or I2C bus used by the drivers */
    lvgl_driver_init();

    uint32_t size_in_px = DISP_BUF_SIZE;
    lv_disp_buf_init(&disp_buf, buf1, buf2, size_in_px);

I run the display update on a separate core like this:


    xTaskCreatePinnedToCore(
        displayTask,
        "displayUpdateTask",
        10000,      /* Stack size in words */
        NULL,
        0,
        NULL,
        0);         /* Core ID */

Make sure that you use Semaphore to ensure that the display task does not mess with your other tasks when you share data between you display and background processing.
I was playing around with the core and Core 0 gave me more performance then core 1, even when I made sure my background tasks where running on the other core. (ESP32) I still need to wrap my head around that…

On ESP32 with a ILI9341 I noticed that the performance is not great, but for the hobby project acceptable. I am using the lvgl_port_esp32 drivers.

Do you generate the parallel output yourself or using a library? Looking at the datasheets, it seems that min write cycle for the ILI9341 is 450ns (~2Mhz) while the ILI9488 is 30ns (~30Mhz) so the library may throttle it down for the ILI9341.

BTW, do you know if on the ILI9488 8bit two cores, the bottleneck is LVGL drawing in memory or the pixel transfer to the TFT? You could double the latter by using 16bits transfers.

I haven’t used the ILI chips, but I think people tend to overclock some component of the transaction past what the datasheet recommends, so the speed of the 9341 might be faster than you think.