Trouble understanding profiler output

Description

I’m chasing a performance bottleneck. Ive profiled the code using the internal profiler but have difficulties understanding what causes the bottleneck.

What MCU/Processor/Board and compiler are you using?

STMH7 + SDRAM running ChibiOS RTOS
GCC

What LVGL version are you using?

Current master

What do you want to achieve?

Understand the profiller output

What have you tried so far?

All optimization steps like correct compiler flags, reduce heavy features like gradients / transparency.

Code to reproduce

Profiller output inside the 7z archive. The output contains 5 sections when the code is running. The 3 minute gaps are most likely caused by flushing a 20MB interal buffer containing the trace output.