STM32F7 slower with data cache enabled

I’m using a STM32F7, no GPU2D
lvgl version is the development branch (version 8).

My system works booth with data cache enabled or disabled

I noticed that the benchmark demo (for the tests I have running) runs faster if the cache is disabled, about 6%

Anyone experienced this difference?

Could this be related with other non lvgl things? (other things running, flush function?)

I suspect that the reason is that you need to flush the Dcache in the flush_cb.