[Render time]How can I find out UI render time restrict and optimize it

I’m using STM32H7 at 240M CPU rate in OS environment, only UI thread and ms tick are run, the rest thread are suspend.
Lvgl is set in one buffer(320 * 30) mode with DMA SPI(design at 24Hz FPS, 30M spi clock) for TFT 320 * 280.
LV_FAST_RAM has assigned to deticated ram block, the same with all key function code.

Monitor call back shows the px refresh:

I try to find out what makes render time too long:
1)use lv_get_tick to confirm spi_dma function -->> every flush cb use 1ms or less. So I can get an conclusion that dma spi is right to use.
2)use oscilloscope and gpio to view what takes the most of the rendering time
–>>from gpio time I find out ‘lv_refr_obj_and_children’ takes the most, ‘lv_refr_obj’ execute many times use 1ms or less. This is accord with the screen render process.
3)use single step debug to confirm ‘LV_EVENT_DRAW_MAIN(POST,PART)_BEGIN’, ‘LV_EVENT_DRAW_MAIN(POST,PART)’, ‘LV_EVENT_DRAW_MAIN(POST,PART)_END’ are synchronous in one lv_task_handler
–>> single step debug seems can confirm this
4)try to calc one px update time
–>>for 7887 px refreshed in 137ms, one px is 17.4us. This means 1 px use 17.4us * 240M = 4176 cpu instruction.
for 48394 px refreshed in 418ms, one px is 8.64us. This means 1 px use 8.64us * 240M = 2073 cpu instruction.
for 48475 px refreshed in 268ms, one px is 5.53us. This means 1 px use 8.64us * 240M = 1326 cpu instruction.
5)enable DMA2D function to faster blend process -->> seems little improve fro render time. I think basic draw process(rect, angle and so on…) and masking process take the most.

This render time cause my TFT refresh too slow, after use a lot of time to analyse it, little optimize I can make. Do you have some suggestion for me?

Can you show your lv_conf.h ?

Hi Eagle, sure.
My lv_conf.h as attach file.
lv_conf.h (22.1 KB)