TL;DR - If worst case frame rate is dominated by the LVGL in memory rendering, optimize it first rather than the transfer rate to the TFT. To understand the situation of your application, measure it.
I recently modified my LVGL app to use dual DMA buffer rendering and though that others may find it interesting.
Configuration: Raspberry Pi RP2040 M0+ Pico @ 125Mhz, ILI9488 320x480 16 bits colors, sufficient RAM to contain 50% of the screen’s pixels (using 2 buffer of 25%) , 8 bits parallel path to the TFT, 15M pixels/sec I/O to the TFT using the RP2040’s PIO, screen updates are dominated by a large chart with 200 points that is rendered from scratch on each update.
Case 1 - Blocking DMA, rendering continues only when DMA completed. Yellow line shows total time in lv_refr_now() and blue line shows DMA activity. Overall frame time is 22ms.
Case 2 - Non blocking DMA. Rendering of the alternate buffer starts while DMA is still active. Frame time is 15.56ms which is 40% speed improvement over case 1. Nice. (Caveat: case 1 could be made faster by using a single buffer of 50% screen (?))
The above two tests were made with the chart having simple data with all points having the same value:
Case 3 - Same as case 2 (non blocking DMA) but with a more complex chart data:
The rendering time increased significantly to a point that the frame time is 51ms and could probably grow even more if the data was were fluctuating more.