@embig71
I think you lose most the performance due to VSYNC. To mitigate this triple buffering can be used. Otherwise you spend time by waiting for VSYNC. The other issue is that you render directly into external SRAM when random writes are very slow compared to the internal SRAM.
I added a new comment in the related topic to figure out what causes the issue.