LVGL is very slow - STM32F7 SSD1963 P16 - Advice on how to optimise it?

RGarrett93 · March 19, 2021, 4:21pm

I’m using PlatformIO, I didn’t try stm32f7xx.h.

I will have a look at it, I assume it will have an automatic build flag otherwise I’ll inspect the core and manually set one for.

It did compile after manually including stm32f767xx.h within v_gpu_stm32_dma2d.c so that should also do the trick?

robekras · March 19, 2021, 4:27pm

The problem is, that the function lv_gpu_stm32_dma2d_init relies on the setting of STM32F4 or STM32F7 or STM32H7.
These defines are made within the stm32f7xx.h (or stm32f4xx.h or stm32h7xx.h).
That is a define for the STM family. When you directly include the stm32f767xx.h this setting isn’t done.

Take a look into the stm32f7xx.h and you will see how it works.

RGarrett93 · March 19, 2021, 4:35pm

Okay thanks roberkras for your help, makes sense.
I will have a look at it directly, quite educational learning how it all goes together too.

So once it’s all been included and the defines made properly for my processor what else is required to get DMA2D to work directly with my setup.
Do I need dual buffer and set the second buffer as
(uint16_t)((uint32_t)0xC0080000) so that it automatically copies directly?

Once I’ve got it all working, I’ll upload the basis of my project source code with lv_example and hopefully it might help others as I’ve learnt alot with implementing FMC but it’s taken a while to get to this point.

robekras · March 19, 2021, 4:51pm

I have a STM32H743 board.
The lvgl working buffer is LV_HOR_RES_MAX * 40. And I use only one working buffer.

The frame buffer (LCD is a ‘dump’ display directly conntected to the STM’s LCD hardware - 24 bit color, 800 x 480).
The SDRAM is mapped to address 0xd0000000.

I do not use DMA for transferring the pixel data from lvgl working buffer to the frame buffer, because the display is a landscape type display, but I have to display in portrait mode.
And the DMA controller doesn’t support rotation. So I have to do it ‘manually’.

The lvgl’s working buffer should be located within the internal SRAM, because of drawing speed.
When using DMA for transferring the data it has to be mentioned that the caches have to be invalidated before transfer, and that DMA transfer is limitted to 16-bit length (in bytes)

RGarrett93 · March 19, 2021, 5:24pm

Do you think there will be much noticable difference in the way I’m doing it now with the flush callback and my pixel data transfer?

The Bank1 FMC I’m using is SRAM which is mapped at 0xC0000000 as it was conflicting DCache when it was mapped to PSRAM.

I’ll have to do some more reading up on it all

embeddedt · March 19, 2021, 9:32pm

Out of the box, LVGL uses DMA2D to accelerate internal rendering processes only. You still control the way pixels are copied to the display.

If you are using a framebuffer which is memory-mapped, you can use this very simple flush_cb implementation to copy LVGL’s working buffer to the framebuffer using DMA2D. lv_gpu_stm32_dma2d_copy invalidates the cache automatically. I’ve been using this in my STM32F7 project for several months without issues.

lv_coord_t area_w = lv_area_get_width(area);
lv_gpu_stm32_dma2d_copy(&fbp16[(area->y1 * 480) + area->x1], 480, color_p, area_w, area_w, lv_area_get_height(area));
lv_disp_flush_ready(disp);

fbp16 points to my framebuffer’s address in memory and 480 is the width of my display.

RGarrett93 · March 19, 2021, 9:53pm

Thank you very much embeddedt, I will give that a go.

So for instance I could do the same with my 800 x 480 display using the below

#define FMC_REGION ((uint32_t)0xC0000000)
#define DataAccess (FMC_REGION + 0x80000)

lv_gpu_stm32_dma2d_copy(&DataAccess[(area->y1 * 800) + area->x1], 800, color_p, area_w, area_w, lv_area_get_height(area));

Does the LVGL DMA2D automatically handle lv_disp_flush_ready(disp); or do I still have to make sure to fire that at the end?

I won’t be able to test it out till this Sunday, will let you know how it goes.

embeddedt · March 19, 2021, 10:38pm

You need to make sure that your framebuffer is mapped like any other buffer in memory. Right now it looks like you are manually sending commands to write data, which is not what LVGL expects.

Yes; you still need to use lv_disp_flush_ready; I forgot to add that to my code sample above.

RGarrett93 · March 20, 2021, 2:33pm

Okay thanks, noted.

I assume you still have to create a working windows for the display prior to lv_gpu_stm32_dma2d_copy as I’ve been doing when I was sending manual commands?

robekras · March 20, 2021, 2:48pm

The ‘working window’ is given as parameters when calling lv_gpu_stm32_dma2d_copy

/**
 * Copy a map (typically RGB image) to a buffer
 *
 * @param buf       a buffer where map should be copied
 * @param buf_w     width of the buffer in pixels
 * @param map       an "image" to copy
 * @param map_w     width of the map in pixels
 * @param copy_w    width of the area to copy in pixels (<= buf_w)
 * @param copy_h    height of the area to copy in pixels
 *
 * @note `map_w - fill_w` is offset to the next line after copy
 */

But be before we get totally confused…
It seems that your display is a ‘smart’ one, and you use the display’s (display controller’s) internal framebuffer!?

In this case you can’t use the lv_gpu_stm32_dma2d_copy.
As far as I see (and understand), this is only meant for copy from one buffer to another buffer.
This is if your working buffer is in SRAM and the frame buffer is in SDRAM (directly accessible by the microcontroller).

In this case you use a general DMA channel.
This is, doing a memory buffer transfer to a single address or IO.

As far as I understand you mapped your display controller to memory and now you write to the display controller via memory access.
In this case you do all as before, but you can speed up the writing to the memory address by using a DMA channel (not the DMA2D)

So your code (see below) can be speeded up by using a DMA channel (copy memory buffer to single memory address.

 for (int y = area->y1; y <= area->y2; y++) {
      for (int x = area->x1; x <= area->x2; x++) {
        *(uint16_t*)((uint32_t)0xC0080000) = color_p->full; //DataAccess
        color_p++;
     }
}