Improving CPU usage / Screen tear

Improving CPU usage / Screen tear

What MCU/Processor/Board and compiler are you using?

STM32F767ZI - STM32DUINO PlatformIO

What LVGL version are you using?

V8.3

What do you want to achieve?

I am using FMC with SSD1963 7" display (800 x 480) but it is suffering badly with screen tear when for instance scrolling or displaying images.
Would DMA2D improve the performance and is there any guidance on this, the documentation is currently TODO?

What have you tried so far?

I am unsure how to use the built-in DMA2D within LVGL, I can see once enabled it should initialise it all but how do I incorporate that within my disp_flush: I have had a look at lv_gpu_stm32_dma2d.c but it looks to me like this is all handled by LVGL?

I attempted with using basic DMA referring back to examples and documentation, as I have done with FMC but I am getting display corruption where the horizontal lines are not lined up.

I am using Bank1 FMC SDRAM and the region is:
CommandAccess
*(uint16_t*)((uint32_t)0xC0000000)

DataAccess
*(uint16_t*)((uint32_t)0xC0080000)

Code to reproduce

DMA2D

lv_conf.h

/*Use STM32's DMA2D (aka Chrom Art) GPU*/
#define LV_USE_GPU_STM32_DMA2D 1
#if LV_USE_GPU_STM32_DMA2D
    /*Must be defined to include path of CMSIS header of target processor
    e.g. "stm32f7xx.h" or "stm32f4xx.h"*/
    #define LV_GPU_DMA2D_CMSIS_INCLUDE "stm32f7xx.h"
#endif

my_disp_flush

static void my_disp_flush(lv_disp_drv_t *disp, const lv_area_t *area, lv_color_t *color_p)
{
  int32_t x, y;

  /***create a working window***/
  *(uint16_t*)((uint32_t)0xC0000000) = (0x2a);            //CommandAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x1 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x1 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x2 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x2 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0000000) = (0x2b);            //CommandAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y1 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y1 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y2 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y2 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0000000) = (0x2c);            //CommandAccess
  /******************************/

  for (int y = area->y1; y <= area->y2; y++) {
    for (int x = area->x1; x <= area->x2; x++) {
      *(uint16_t*)((uint32_t)0xC0080000) = color_p->full; //DataAccess
      //*(uint16_t*)((uint32_t)0xC0080000) = (*color_p); //DataAccess
      color_p++;
    }
  }

  lv_disp_flush_ready(disp); // tell lvgl that flushing is done 
}

Manually trying DMA

init_dma

void init_dma(void) {
  __HAL_RCC_DMA2_CLK_ENABLE();

  hdma_memtomem.Instance = DMA2_Stream0;
  hdma_memtomem.Init.Channel = DMA_CHANNEL_0;
  hdma_memtomem.Init.Direction = DMA_MEMORY_TO_MEMORY;
  hdma_memtomem.Init.PeriphInc = DMA_PINC_ENABLE;
  hdma_memtomem.Init.MemInc = DMA_MINC_ENABLE;
  hdma_memtomem.Init.PeriphDataAlignment = DMA_PDATAALIGN_HALFWORD;
  hdma_memtomem.Init.MemDataAlignment = DMA_MDATAALIGN_HALFWORD;
  hdma_memtomem.Init.Mode = DMA_NORMAL;
  hdma_memtomem.Init.Priority = DMA_PRIORITY_LOW;
  hdma_memtomem.Init.FIFOMode = DMA_FIFOMODE_DISABLE;

  HAL_DMA_Init(&hdma_memtomem);
}

transfer_data_dma

void transfer_data_dma(uint16_t *src, uint16_t *dest, uint32_t size) {
  // Configure DMA for memory-to-memory transfer
  hdma_memtomem.Init.Mode = DMA_CIRCULAR;

  // Set source and destination addresses and size
  hdma_memtomem.Instance->PAR = (uint32_t)src;
  hdma_memtomem.Instance->M0AR = (uint32_t)dest;
  hdma_memtomem.Instance->NDTR = size;

  // Enable DMA transfer
  HAL_DMA_Init(&hdma_memtomem);
  HAL_DMA_Start(&hdma_memtomem, (uint32_t)src, (uint32_t)dest, size);
}

my_disp_flush_dma

static void my_disp_flush_dma(lv_disp_drv_t *disp, const lv_area_t *area, lv_color_t *color_p)
{

  SCB_CleanInvalidateDCache();
  // Set the draw area (if needed)
  /***create a working window***/
  *(uint16_t*)((uint32_t)0xC0000000) = (0x2a);            //CommandAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x1 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x1 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x2 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->x2 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0000000) = (0x2b);            //CommandAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y1 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y1 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y2 >> 8);   //DataAccess
  *(uint16_t*)((uint32_t)0xC0080000) = (area->y2 & 0xFF); //DataAccess
  *(uint16_t*)((uint32_t)0xC0000000) = (0x2c);            //CommandAccess
  /******************************/

  // Use DMA to transfer pixel data to the display
  //transfer_data_dma((uint16_t *)color_p, (uint16_t*)((uint32_t)0xC0080000), (area->x2 - area->x1 + 1) * (area->y2 - area->y1 + 1));

uint32_t pixel_count = (area->x2 - area->x1 + 1) * (area->y2 - area->y1 + 1);
transfer_data_dma((uint16_t *)color_p, (uint16_t*)((uint32_t)0xC0080000), pixel_count);
color_p += pixel_count;

  lv_disp_flush_ready(disp); // tell lvgl that flushing is done 
}

Video

See attached Video.zip
Video.zip (3.3 MB)

Thanks in advance

i dont see video , but here in your flush only first line is relative ok.
Second += is waste and last line flush_ready need place into dma complete callback.