In lvgl 7 redraw button backgroud color too slow

Description

What MCU/Processor/Board and compiler are you using?

NUC977

What do you want to achieve?

I want to make button state draw quickly.(In fact i want to improve the lvgl running preference)

What have you tried so far?

I try to use lvgl 6.0.1 and use default theme.It’s quickly when I press the big button.(Button Size 250*50).
But it’s slow in lvgl 7.10.0.I use the same code to test.I can see the draw by eyes at 7.10.0version.

Code to reproduce

Add the relevant code snippets here.

The code block(s) should be between ```c and ``` tags:

/*You code here*/
lv_obj_t *homeCont;
lv_obj_t *btn, *label;

homeCont = lv_obj_create(lv_scr_act(), NULL);
lv_obj_set_size(homeCont, 800, 480);
lv_obj_set_pos(homeCont, 0, 0);

btn = lv_btn_create(homeCont, NULL);
lv_obj_set_size(btn, 200, 250);
lv_obj_set_pos(btn, 300, 140);
label = lv_label_create(btn, NULL);
lv_label_set_text(label, "Espresso");

Screenshot and/or video

If possible, add screenshots and/or videos about the current state.

LVGL 7 is slightly slower than 6 with smaller display buffer sizes. If you have the RAM to spare, I would suggest increasing the buffer size in your display driver.

I already use full screen double display buffer.CPU is 300MHZ.
I enable the LV_USE_PERF_MONITOR and see the fps is 12 when i press the button.

How is flush_cb implemented? Are you using DMA or a manual copy with the CPU?

@kisvegabor Any ideas?

Thanks for your reply.
No use dma,manual copy with the cpu to display ram.
I try to use f1c100s is faster than nuc977.It seems the same core ARM926.And the port code is the same too…

You will see significant speed improvements if you are able to use DMA.

nuc977 is direct use the dram without DMA.I confuse why I press the button soon.The fps will be 33 to 12.

I try to reduce the dis_buf,it will increase the speed。

    static lv_disp_buf_t draw_buf_dsc;
    static lv_color_t draw_buf_3_1[LV_HOR_RES_MAX * LV_VER_RES_MAX/2];            /*A screen sized buffer*/
    static lv_color_t draw_buf_3_2[LV_HOR_RES_MAX * LV_VER_RES_MAX/2];            /*An other screen sized buffer*/
    
    lv_disp_buf_init(&draw_buf_dsc, draw_buf_3_1, draw_buf_3_2, LV_HOR_RES_MAX * LV_VER_RES_MAX/2);   /*Initialize the display buffer*/


    static lv_disp_buf_t draw_buf_dsc;
    static lv_color_t draw_buf_3_1[LV_HOR_RES_MAX * LV_VER_RES_MAX/2];            /*A screen sized buffer*/
    static lv_color_t draw_buf_3_2[LV_HOR_RES_MAX * LV_VER_RES_MAX];            /*An other screen sized buffer*/
    
    lv_disp_buf_init(&draw_buf_dsc, draw_buf_3_1, draw_buf_3_2, LV_HOR_RES_MAX * LV_VER_RES_MAX);   /*Initialize the display buffer*/

“LV_HOR_RES_MAX * LV_VER_RES_MAX/2” is faster than “LV_HOR_RES_MAX * LV_VER_RES_MAX.”

When you use two full size buffer, lvgl enables double buffering.
In this case every now and then (don’t know when it will happen), the entire buffer (1 or 2)
will be copied from the one to the other buffer before the update drawing takes place.
This copying of course takes a lot of time.
When you use an extra framebuffer, either an external one (in display) or an internal (as it is
supported by the STM32F4/7 STM32H7), I would recommend only to use one buffer.
You can use a full size buffer or a smaller buffer. But not using the double buffering will
avoid the full size buffer copy.

There is no point in having two non-fully-sized buffers if you are not using DMA. Take a look at the buffer configuration docs. The idea with two buffers is that DMA can be transferring one while the CPU draws into the other one. This gives you speed because you no longer have to wait for the transfer to finish.

When not using DMA, you should only use one buffer, otherwise you are pretty much just wasting RAM, since your driver can’t be copying the buffer in a loop at the same time that LVGL is running.

I have seen a similar decrease in speed with v7. I have stayed with v6 to wait and see how v8 performs.

I posted on one of the v7 performance issues I saw (style updates), but I also saw redraws of general objects and labels is noticeably slower in v7.

My tests were using an i.MXRT1052 with DMA, a full screen sized buffer, using 16bit parallel interface to a 320x240 display (16-bit color). The parallel interface is running at 20MHz.

On v6 an object update was faster than humanly visible, on v7 the update of individual elements was now noticeable as they were updated, for exactly the same UI.

Thanks.I think the buffer by array is no-cache.So buffer copy to dram it’s slower than expected.
But i dont know how to improve in this core.

Yes,but i am confuse why i use the same core chip “F1C100S” can better than “NUC977”.
Maybe i should think about going back to v6.

Hi,I notice the lv_refr.c have some code for stm32 dma2d.
NUC977 have hardware 2d too.It can use to fill the area.
How can i change the code to improve the speed?Can you show me some idea?