Change to external SDRAM, with lv_port_stm32f746

epikao · June 2, 2022, 3:30pm

Hello,

I am using lv_port_stm32f746_disco, but with a 1024x600 display instead of a 480x272.
Now I need to adjust the buffer from 480 x 48 to 1024 x 100 (1024 / 10).

||static lv_disp_draw_buf_t disp_buf_1;|
|---|---|
||static lv_color_t buf1_1[TFT_HOR_RES * 61]; //68,61|
||static lv_color_t buf1_2[TFT_HOR_RES * 61]; //68,61|
||lv_disp_draw_buf_init(&disp_buf_1, buf1_1, buf1_2, TFT_HOR_RES * 61);   /*68,61 Initialize the display buffer*/|

Unfortunately this does not fit into the (internal?) RAM (maximum 1024*61).
Is there a tutorial what all I have to change on this lv_port_stm32f746 to go to the external SDRAM ?

Thank you

spider_vc · June 2, 2022, 4:27pm

I think this question not for this forum but…
You situation depends on compiler that you use for stm32.
For gcc based code you need define buffers in sdram section.

static lv_color_t buf1_1[TFT_HOR_RES * 61] __attribute__ ((section (".sdram")));

epikao · June 2, 2022, 7:08pm

hmm, I use the standard stm32IDE with no special setup, so should be the gcc…
in the linkerScript I added following:

  /* User_heap_stack section, used to check that there is enough RAM left */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >RAM

  .sdram (NOLOAD) :
  {
    . = ALIGN(4);
    _ssdram = .;
    *(.extram .extram.*);
    . = ALIGN(4);
    _esdram = .;
  } > SDRAM

then in “tft.c” I tried different settings as following:

//static uintpixel_t my_fb[TFT_HOR_RES * TFT_VER_RES]__attribute__ ((section(".extram")));

//static __IO uintpixel_t * my_fb = (__IO uintpixel_t*) (0x60000000);
static __IO uintpixel_t * my_fb = (__IO uintpixel_t*) (0xC0000000);
...
||static lv_disp_draw_buf_t disp_buf_1;|
|---|---|
||static lv_color_t buf1_1[TFT_HOR_RES * 61]__attribute__ ((section(.extram))); //68,61|
||static lv_color_t buf1_2[TFT_HOR_RES * 61]__attribute__ ((section(.extram))); //68,61|
||lv_disp_draw_buf_init(&disp_buf_1, buf1_1, buf1_2, TFT_HOR_RES * 61);   /*68,61 Initialize the display buffer*/|

unfortunately I have always a black or white screen…

Do i have to set anything else? In the tft.c file at the LCD Init the following two lines are available by default:

    BSP_SDRAM_Init();
    HAL_EnableFMCMemorySwapping();

Thank you

epikao · June 2, 2022, 7:18pm

If I uncomment swapping, I see something on the screen but then it hang, freezes…

Debugger goes to HardFault_Handler, see:

What could be wrong?

spider_vc · June 3, 2022, 4:56am

Can you show you full Linker script?

epikao · June 3, 2022, 6:09am

Here, with addition as posted above:

github.com

lvgl/lv_port_stm32f746_disco/blob/master/LinkerScript.ld

/*
*****************************************************************************
**

**  File        : LinkerScript.ld
**
**  Abstract    : Linker script for STM32F746NGHx Device with
**                1024KByte FLASH, 320KByte RAM
**
**                Set heap size, stack size and stack location according
**                to application requirements.
**
**                Set memory bank area and size if external memory is used.
**
**  Target      : STMicroelectronics STM32
**
**
**  Distribution: The file is distributed as is, without any warranty
**                of any kind.
**

This file has been truncated. show original

I found following info:
https://developer.arm.com/documentation/ka002886/latest

According this I added following code before SDRAM init (in tft.c) and now it works.
BUT: Since my buffer was not 1/10 of the horizontal resolution, I hoped that the lv_task_handler would do the job much faster now with full buffer and external SDRAM… but unfortunately it seems not the case …

With this testcode:

while (1)
{
    	////////////////// TESTCODE ///////////////////////////

        if( ( millis()-lastMillis2 ) > 4){
        	lv_task_handler();
        	lastMillis2 = millis();
        }

        for(i = 0; i < 1000; i++) {
        	lv_meter_set_indicator_value(guider_ui.screen_meter_1, screen_meter_1_scale_1_ndimg_0, lv_rand(0, 1000));
                HAL_GPIO_TogglePin(GPIOI, GPIO_PIN_8); //LED 101
        }

}

the frame rate is about 5 FPS at buffer 1024 x 10 and about 9 FPS at 1024 x 61… and now with external RAM again at 5 FPS. I do not understand this …

Here the mentioned Code before SDRAM init in “tft.c”:

/* Assert backlight LCD_BL_CTRL pin */
HAL_GPIO_WritePin(LCD_BL_CTRL_GPIO_PORT, LCD_BL_CTRL_PIN, GPIO_PIN_SET);
MPU_Region_InitTypeDef MPU_InitStruct;

/* Disable the MPU */
HAL_MPU_Disable();

/* Configure the MPU attributes for SDRAM */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0xC0000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_4MB;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER0;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;

HAL_MPU_ConfigRegion(&MPU_InitStruct);

/* Enable the MPU */
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);

BSP_SDRAM_Init();
//HAL_EnableFMCMemorySwapping();

geert-KLA-BE · June 3, 2022, 2:55pm

Putting the draw buffer in SDRAM is too slow because of the big latencies. I had the same experience on the stm32_F469i. I would suggest to reduce your buffers so they fit in internal ram (which has almost no latency). Making it smaller will not effect performance that much.
Though with the F7 you might be able to use some caching functionality on the external RAM.

epikao · June 3, 2022, 3:04pm

ok interesting, thanks for the tip.
I have the feeling that the used lv_port writes directly to the SDRAM even without adjustment, because the SDRAM initialization is enabled…

Ok, so now I rather have to see what I have to change to write to internal RAM…

geert-KLA-BE · June 7, 2022, 9:53am

I think you need to read the display configuration documentation of lvgl to get a better understanding.
LVGL has 2 kinds of buffers in use.

The actual framebuffer that contains the whole screen and is transferred 30-60 times per second to the display.
static __IO uintpixel_t * my_fb = (__IO uintpixel_t*) (0x60000000);

The LTDC hardware copies that memory buffer to the display. This is put in ext. RAM because the whole frame does not fit in the internal ram. Here the latencies are not an issue because it is copied in large blocks directly without CPU intervention.

Next to the framebuffer you also have one or two lvgl draw buffers. These are used by lvgl to do the actual graphic drawing calculations. These calculations are done on blocks the size of your draw buffers.

    static lv_disp_draw_buf_t disp_buf_1;
    static lv_color_t buf1_1[TFT_HOR_RES * 68];
    static lv_color_t buf1_2[TFT_HOR_RES * 68];

On this part a lot of write/read operations take place by the CPU to do the drawings. This is the reason why the RAM latency has a huge effect on the performance. So these you place in internal RAM.
When the calculations are done for the buffer it gets transfered to the framebuffer. This is done in the the flush callback function. You can use DMA2D to offload the CPU and do the actual copying for you. This allows you to continue calculations on the next drawing block.

So just change the draw buffer sizes so they fit your memory and check if the implementation uses DMA2D for transfering the buffers.

PS. DMA2D can use rectangular buffer sizes. Regular DMA needs a consecutive memory block. This allows you to use draw buffers smaller then a full line width.

epikao · June 7, 2022, 11:34am

Thank you very much for this explanation. So 0x60000000 is automatically ext. RAM region…

Ok, according following code in my tft.c file, I think DMA2 is already in use.

#define CPY_BUF_DMA_STREAM               DMA2_Stream0
#define CPY_BUF_DMA_CHANNEL              DMA_CHANNEL_0
#define CPY_BUF_DMA_STREAM_IRQ           DMA2_Stream0_IRQn
#define CPY_BUF_DMA_STREAM_IRQHANDLER    DMA2_Stream0_IRQHandler

geert-KLA-BE · June 7, 2022, 11:41am

Looking at the lv_port code it uses the regular DMA. Not DMA2D (DMA2 is not the same).
So in this case you need to keep full line widths. Otherwise you will need to change the code to use the DMA2D block.
Which would be a good learning exersice

epikao · June 7, 2022, 12:24pm

Isn’t this simply done with the conf file of lvgl?

/*Use STM32's DMA2D (aka Chrom Art) GPU*/
#define LV_USE_GPU_STM32_DMA2D  1

#define LV_GPU_DMA2D_CMSIS_INCLUDE "stm32f746xx.h"

geert-KLA-BE · June 7, 2022, 12:41pm

Nope that is internal lvgl that uses the engine for accelerating the drawing. In my experience it did not have any influence on speed. But might depend on your use case.

epikao · June 7, 2022, 12:50pm

Ok, however, according to the following link, seems DMA2D does not really help much:

geert-KLA-BE · June 8, 2022, 7:45am

Correct that is what I stated in my message. That is using it for drawing internally in lvgl.
But for copying in rectangular blocks to SDRAM it really helps if you don’t draw full lines.

kommlabs · January 3, 2023, 12:54pm

Hello,
I’m also looking to boost performance. Could you share DMA2D configured code for display flushing?

Regards,
Keshav