Why is DMA called many times in disp_flush on STM32F746 example?

I am looking at the STM32F746 example:

In the ex_disp_flush function, it is setup for DMA transfer and once it is finished it does a call back on
DMA_TransferComplete where it checks for y2_fill and does another DMA transfer.

So why is it splits into many transfers ? Can’t it do the whole lot in one go and save processing time ?

The transfer length is a uint32_t so I assume it can do a big transfer in one go.

1 Like

Using DMA is always saving processing time.

Of course, using larger buffers would ‘save’ some processing time, as initiating a new DMA transfer needs some time.

On the other hand, micro controllers are restricted in RAM size.

But take a look into the code. What is set for buffer size?
It’s LV_HOR_RES_MAX * 68. (LV_HOR_RES_MAX = 480)
So 480 * 68 = 32 640.
As the color size is 16 bit (I assume), buffer size is 32 640 * 2 = 65 280 (0xff00)
This is slightly less than the full unsigned 16-bit value of (0xffff)

As you may have learned from this thread :wink: (Stm32f429 + ili9341 spi),
DMA transfer size is limited to 16 bit.

But there is one interesting question:
As I see, the example is setting up (and using?) two buffers.
I always thought two buffers are setup when using double buffering. But that would need full frame buffer size.
‘Processing’ time could be saved if lvgl is using two buffers (as in this example), drawing into one buffer,
while the other buffer is used for transferring the content via DMA to real framebuffer.
Is this the case?

1 Like

Does having a full screen buffer results in a significantly faster screen refresh compared to let’s say a half or quarter screen buffer?.

They may reduce the computation of multiple rendering passes by LVGL but I presume that those passes use an efficient short-circuit logic that skips the computation of out-of-buffer pixels.

1 Like

TL;DR Larger buffers are better on modern LVGL versions. You can probably reduce the size to 1/4 or potentially even lower if your system is fast enough without seeing a noticeable performance drop.

In v6, I found there was a minimal difference between a 1/10th buffer and a fullscreen buffer, enough that you wouldn’t notice it on a sufficiently fast processor.

From 7.0 to 7.3, the new style system did not cache styles between rendering passes, meaning that styles had to be recomputed every time rendering started. This caused a very noticeable slowdown with small buffers. I personally moved from 1/10th to around 1/3 of the screen to keep a decent framerate.

As of 7.4 styles do get cached between rendering passes, which mitigates the problems of earlier versions. However, I still find that modern 7.x versions need a larger buffer than 6.x to match the performance.

I believe that technically the entire object gets drawn on each pass, but the lower-level logic (for drawing rectangles, circles, etc.) does skip computing masked-out areas.

I search example for right optimized DMA, and found this. But code for this STM is hmm waste of power. Flush cb copies rendered buffer line by line into full buffer on other memory area. Instead this need setup direct rendering. Flush is here ended in last line callback = no parallel no more speed…

Can anybody show me link to example with real send data to display in parallel with rendering next area?