V8 display driver (double buffer) low FPS & high CPU

As previously mentioned the full, double buffered approach is unusable due to low FPS and high CPU usage. The alternate, single buffer approach, does work (i.e. low CPU @ 33FPS) BUT falls down when refreshing the whole screen which takes nearly 1 second! and is very obvious visually.

I have based the single buffer approach along the lines used in your attached file, lvgl_support.c.
I would really like to get this resolved as it is preventing me from upgrading to V8.

So for full screen refresh 1 framebuffer is slower than 2 frame buffers?

With 1 framebuffer what is the size the draw buffer? It should be ~1/10 screen sized and placed into internal RAM.

Could you send a video about the 1 framebuffer’s full screen redraw?

So the 2 buffer situation is more expensive overall - slow and unresponsive due to high CPU (not the case on V7 but I understand you’ve made some changes to post flush synchronisation).
The 1 buffer is good up until a full screen refresh is required, at which point it becomes visually slow (banding on the size of the smaller display buffer which is 1/5 of screen).

I’ll capture a video of both so you can see what I’m dealing with.

1 Like

I’m finding the same thing - double buffer frame rate in v7 has fallen dramatically in v8. I had to set full_refresh = 1 to force the whole buffer to be dumped as well. So I changed the display handling from double buffer frame switching to a single buffer (full_refresh = 0).

Here’s my flush callback functions.

The provided software copy worked for me at about 7 FPS.

uint32_t w = lv_area_get_width(area);
uint32_t y;
for(y = area->y1; y <= area->y2 && y < disp_drv->ver_res; y++) 
{
  lv_memcpy(&vram[2][y * LV_HOR_RES_MAX + area->x1], color_p, w * sizeof(lv_color_t));  
  color_p += w;
}
lv_disp_flush_ready(disp_drv);

Switching to the (unused) GPU code works at about 8 FPS

lv_gpu_stm32_dma2d_copy((lv_color_t *)&vram[2][(area->y1 * LV_HOR_RES_MAX) + area->x1], LV_HOR_RES_MAX, color_p, lv_area_get_width(area), lv_area_get_width(area), lv_area_get_height(area));  	
lv_disp_flush_ready(disp_drv);

Changing the code so that lv_disp_flush_ready was not called until the DMA completes should speed things up. But this only gave me 9 FPS.

static lv_disp_drv_t * dma_disp_drv = NULL;

void DMA2D_IRQHandler(void)
{
	HAL_DMA2D_IRQHandler(&hdma2d);
}

static void TxComplete(DMA2D_HandleTypeDef * hdma2d)
{
	lv_disp_flush_ready(dma_disp_drv);
}

static void tft_flush_cb(lv_disp_drv_t * disp_drv, const lv_area_t * area, lv_color_t * color_p)
{
  uint32_t w = lv_area_get_width(area);
  uint32_t h = lv_area_get_height(area);	

  hdma2d.Init.Mode = DMA2D_M2M;
  hdma2d.Init.OutputOffset = LV_HOR_RES_MAX - w;
	
  if(HAL_DMA2D_Init(&hdma2d) == HAL_OK)
  {
    if(HAL_DMA2D_ConfigLayer(&hdma2d, 1) == HAL_OK)
    {
      HAL_DMA2D_RegisterCallback(&hdma2d, HAL_DMA2D_TRANSFERCOMPLETE_CB_ID, TxComplete);
      lv_color_t * d = &vram[2][(area->y1 * LV_HOR_RES_MAX) + area->x1];
      if(HAL_DMA2D_Start_IT(&hdma2d, (uint32_t)color_p, (uint32_t)d, w, h) == HAL_OK)
      {
      }
    }
  }
}

The next thing to try would be to maintain the two display buffers outside the LVGL framework and update each buffer from the flash callback.

Hi @kisvegabor,

Hope you are well!

I have been absent for a long period now due to massive work commitments which are still ongoing!

I would like to add to this post, I can confirm that my Zynq implementation using standard WXGA+ (1440x900) is also only managing a few frames per second with version 8. I have had to revert back to version 7 for the time being as I have no time to look into this right now. My current driver uses double buffering with high speed DMA transfers so the bottleneck at first glance is caused by the copying of the entire buffer to the second buffer in version 8 verses the selective (only the parts that have changed) updating of the second buffer in version 7. I am unsure of the reasons for the change of methodology for the version 8 implementation, but I am sure there must be some good reason?

Also I have written a basic VNC server that runs on my platform based on full screen refreshes at this stage, this puts a high load on the CPU even on version 7, it has absolutely no chance of working on version 8. Although there may be a way to hook into the drawing engine to help this along, having access to the drawing of the rectangles for the screen refresh to pass to the VNC server would be good but I haven’t had chance to dig into your code to see if this is feasible.

If you have any suggestions on how to improve this situation which appears to be affecting more people as time goes on, It would be much appreciated. If it can’t be resolved I think I will have to freeze my GUI at version 7 for my current projects :slight_smile:

Kind Regards,

Pete

Hi @pete-pjb and @xennex,

To be sure we are on the same page I’d like to clarify the difference between v7 and v8 in this regard.

v7
If you set 2 screens sized buffers, v7 worked in a special “true double buffered mode”. It worked like this:

  1. Render invalid areas to the inactive buffer
  2. Swap the buffers
  3. Copy the redrawn areas to new inactive buffer to have them the same content.

I considered it wasteful especially with screen sized animations because here the whole screen was copied and then overwritten it a new content.

v8
There are 2 modes that can be set explicitly:

  • full_refresh: LVGL always redraws the whole screen. So no copying is required.
  • direct mode: LVGL redraws only the dirty areas in a screen sized buffer. So NO full-screen refresh, but the buffers are not synchronized. It’s useful, if you can send full frames to a GPU to display.

IMO performance issue is the largest if you set full_refresh but only a small area changes. In v7 it was fast (small rendering + small copy) but in v8 it’s slower (large rendering).

I was thinking about adding a v7-like mode to v8, but in v8 we started to think more abstractly. I.e. the draw buffer can be anything like an lv_color_t array, SDL Texture in GPU, specially packed bitmap, or any internal non accessible buffer. So we can’t simply memcpy it.

However, in the flush_cb you can mimic v7’s behavior like this:

  1. Set 2 full screen buffer with full_refr = 0, direct_mode = 1
  2. In flush_cb check if lv_disp_flush_is_last(drv) == true
  3. If so, copy the areas:
my_set_active_buffers(color_p);
lv_disp_t* disp = _lv_refr_get_disp_refreshing();
for(i = 0; i < disp->inv_p; i++) {
   ​if(disp->inv_joined[i]) continue;
 ​
 ​  my_copy_area(disp->inv_areas[i]);
}

I haven’t tested it but I’d be happy to help with it further if you can help with testing on embedded HW.

Hi @kisvegabor,

Many thanks for your explanation and suggested solution, I will take a look at this when I can and report back here. (This also looks like a good way to reduce the CPU load for my VNC server implementation as I can pass the data to my network buffers as small rectangles as opposed to doing entire screen refreshes all the time!)
Currently I am in the middle of a fairly demanding bare-metal USB hardware/driver exercise and it is sapping all of my time and energy. So it maybe a while before I come back with the results. I will keep an eye on this topic also to see if @xennex or @jupeos have any comments also.

Kind Regards,

Pete

1 Like

Hi @kisvegabor,

Hope you are well!

I can confirm I have managed to implement a method based on your post which closely emulates V7 methodology and the performance is very similar to before. Thank you for the suggestion.

I won’t mark it as the solution as yet as it would be interesting to see if @xennex and @jupeos have had similar results…

Kind Regards,

Pete

Thanks for the feedback, Pete!

1 Like

I’m waiting for this to be resolved before trying this out. Will post my results when I’ve done so.

I need some more clarity regarding the implementation.

I’ve read that in direct mode, LVGL first redraws all the dirty areas in the screen-sized buffer and then calls the flush function, then in that case what is the meaning of checking lv_disp_flush_is_last(drv) == true because only 1 chunk is being flushed right? (Q1)

Let’s say we have FB1 and FB2 and currently FB1 is active and flush_cb is called, we have to do 2 tasks. 1st copy the invalidated areas from FB1 to FB2 and 2nd, flush the unioned invalidated of FB1 to the screen.

Q2 Can you please comment on the ordering of the 2 tasks?
Q3 How will the FB2 become active for the next rendering? will LVGL internally take care of it or, do I have to do it some way in flush_cb()? If later then please suggest how?

Q4. If suppose, a rendering request comes while we are busy with the first task 1. in flush_cb then how it will be handled, please comment.

By accident flush_cb was called for all areas. So it’s a bug and lv_disp_flush_is_last(drv) == true is the workaround.

You should switch first. Else you will copy to the active frame buffer,

It’s managed by LVGL if you pass 2 buffers to lv_draw_buf_t.

LVGL won’t render anything until you free the buffer by calling lv_disp_flush_ready().

Hi @kisvegabor,
I’m developing on imx RT1176 nxp eval board, I’ve managed to compile the demo project (lv_demo_widgets) with the latest lvgl revision from github. I setup doublebuffering (2 framebuffers), full_refresh=0 and direct_mode=1, so far it seems to work, but the rendering has a weird behaviour, like you can see in the attached images where there is the screen as it appears after pressing the “logout” button and after moving the slider. My screen is 720x1280 RGB565.

Am I missing something in the flush display callback? here to code I used:

static void DEMO_FlushDisplay(lv_disp_drv_t *disp_drv, const lv_area_t *area, lv_color_t *color_p)
{
    g_dc.ops->setFrameBuffer(&g_dc, 0, (void*) color_p);

	lv_disp_t *disp = _lv_refr_get_disp_refreshing();

    if(lv_disp_flush_is_last(disp->driver))
    {
		uint8_t *fb;
		//fb is the framebuffer to syncronize
		if((uint8_t*)color_p==(uint8_t*)&s_frameBuffer[0])
			fb = (uint8_t*)&s_frameBuffer[1];
		else
			fb = (uint8_t*)&s_frameBuffer[0];

		for (int i = 0; i < disp->inv_p; i++)
		{
			if(disp->inv_area_joined[i])
				continue;

			//copy area from color_p to fb
			copy_area_to_fb(fb, &(disp->inv_areas[i]), color_p);
		}
    }

    lv_disp_flush_ready(disp->driver);
}	

after pressing “logout”


look at the “invite” button truncated
after moving the slider

look at the “hardworking” switch which is now all grey (like “team player”)

thank you

You need full_refresh=1 and direct_mode=0 if you have “normal” double buffering.

Hi @kisvegabor,
what I try to achive is full_refresh=0 and direct_mode=1, like you suggested in a previous post:

many thanks for the reply,
gianluca

Hi @gianlucacornacchia ,

My hardware will be different to yours but I am doing the same thing from a hardware/software functional point of view.

You haven’t shared all of your relevant code but here is my approach, hopefully it will show something to help:

void gui_update_thread(void *p) {

	cpu0_globals->spawn_stat |= GUI_RUN;
	// Initialise VGA Hardware
	set_vga_prams( VGA_1440X900_60HZ_CVTR );
	// Initialise GUI
	lv_init();
	lv_theme_default_init(cpu0_globals->gui.disp, shmem_p->personality == OTG_IDU ? confp->sys.IDU_gui_colour :
	  confp->sys.ODU_gui_colour, lv_palette_main(LV_PALETTE_PURPLE), (((shmem_p->personality == OTG_IDU) ? confp->sys.IDU_style :
	  confp->sys.ODU_style) ? 0 : 1), LV_FONT_DEFAULT);
	lv_disp_drv_init((lv_disp_drv_t*)&cpu0_globals->gui.disp_drv);
	lv_disp_draw_buf_init(&cpu0_globals->gui.disp_buf, (void*)LV_VDB_ADR, (void*)LV_VDB2_ADR, (LV_HOR_RES_MAX*LV_VER_RES_MAX));
	cpu0_globals->gui.disp_drv.flush_cb = vga_disp_flush;
	cpu0_globals->gui.disp_drv.hor_res = LV_HOR_RES_MAX;                 /*Set the horizontal resolution in pixels*/
	cpu0_globals->gui.disp_drv.ver_res = LV_VER_RES_MAX;                 /*Set the vertical resolution in pixels*/
	cpu0_globals->gui.disp_drv.draw_buf = &cpu0_globals->gui.disp_buf;
	cpu0_globals->gui.disp_drv.full_refresh = pdFALSE;
	cpu0_globals->gui.disp_drv.direct_mode = pdTRUE;
	cpu0_globals->gui.disp = lv_disp_drv_register((lv_disp_drv_t*)&cpu0_globals->gui.disp_drv);
	lv_disp_set_bg_opa(NULL, LV_OPA_TRANSP);
	startup_gui_create();
	lv_timer_create((lv_timer_cb_t)process_msg_q, 10, NULL);	// Check for GUI thread messages every 10ms
	while(1) {
		lv_task_handler();
		vTaskDelay(pdMS_TO_TICKS(4));
	}
}

void vga_irq_handler( void *p ) {

	vga->vga_fbuf_addr = cpu0_globals->gui.dma_src;
	cpu0_globals->gui.buf_switched = pdTRUE;
	XScuGic_Disable(&xInterruptController, XPAR_FABRIC_VGA_IP_0_FRM_CPT_IRQ_INTR);
	lv_disp_flush_ready((lv_disp_drv_t*)&cpu0_globals->gui.disp_drv);
}

static void update_dual_buf( lv_disp_drv_t *disp_drv, const lv_area_t *area, lv_color_t *colour_p ) {

	lv_disp_t*	disp = _lv_refr_get_disp_refreshing();
	lv_coord_t	y, hres = lv_disp_get_hor_res(disp);
    uint16_t	a;
    lv_color_t	*buf_cpy;

    if( colour_p == disp_drv->draw_buf->buf1)
        buf_cpy = disp_drv->draw_buf->buf2;
    else
        buf_cpy = disp_drv->draw_buf->buf1;

    for(a = 0; a < disp->inv_p; a++) {
    	if(disp->inv_area_joined[a]) continue;
        lv_coord_t w = lv_area_get_width(&disp->inv_areas[a]);
        for(y = disp->inv_areas[a].y1; y <= disp->inv_areas[a].y2 && y < disp_drv->ver_res; y++) {
            memcpy(buf_cpy+(y * hres + disp->inv_areas[a].x1), colour_p+(y * hres + disp->inv_areas[a].x1), w * sizeof(lv_color_t));
        }
    }
}

void vga_disp_flush(lv_disp_drv_t *disp_drv, const lv_area_t *area, lv_color_t *colour_p) {

	static uint8_t	first_call = 1;

	if( first_call ) {
		cpu0_globals->gui.dma_src = (uint32_t)colour_p;
		first_call =  0;
		vga->total_pixels &= ~DMA_FIFO_RST; 	// Release Reset
		vga->total_pixels |= DMA_FRAME_READY;	// Start Proceedings
		vga->irq_reg = VGA_IRQ_EN;				// Enable DMA complete interrupt
	}
	if( lv_disp_flush_is_last( disp_drv ) ) {
		cpu0_globals->gui.dma_src = (uint32_t)colour_p;
		cpu0_globals->gui.buf_switched = pdFALSE;
		XScuGic_Enable(&xInterruptController, XPAR_FABRIC_VGA_IP_0_FRM_CPT_IRQ_INTR);
		while(!cpu0_globals->gui.buf_switched);// vTaskDelay(1);
		update_dual_buf(disp_drv, area, colour_p);
	}

	lv_disp_flush_ready( disp_drv );
}

Also it may be worth checking your artefacts aren’t being caused by an unflushed cache.

Kind Regards,

Pete

Hi @pete-pjb,
sorry for late answer but I’ve been busy in the last days. Thank you for sharing your code, but as far as I see the update code works the same as mine. I suspect the issue maybe in the semaphore handling. In the next days I will upload my code to show the complete implementation.

regards,
Luca

Hi Luca ( @gianlucacornacchia ) ,

Thanks for letting me know.

I have also been helping another developer here with this issue. It might be worth you keeping an eye on this thread also. I agree you should definitely engineer the semaphore out of the driver. It is not good to have blocking code in the LVGL flush mechanism this is definitely a source of performance issues in my opinion. :slight_smile:

Kind Regards,

Pete

Hi @pete-pjb,
I’ve implemented the code following your suggestions in the other answer, here the code:

void lv_port_disp_init(void)
{
    memset(s_frameBuffer, 0, sizeof(s_frameBuffer));
    lv_disp_draw_buf_init(&disp_buf, s_frameBuffer[0], s_frameBuffer[1], LCD_WIDTH * LCD_HEIGHT);

    status_t status;
    dc_fb_info_t fbInfo;

#if LV_USE_GPU_NXP_VG_LITE
    /* Initialize GPU. */
    BOARD_PrepareVGLiteController();
#endif

    /*-------------------------
     * Initialize your display
     * -----------------------*/
    BOARD_PrepareDisplayController();

    status = g_dc.ops->init(&g_dc);
    if (kStatus_Success != status)
    {
        assert(0);
    }

    g_dc.ops->getLayerDefaultConfig(&g_dc, 0, &fbInfo);
    fbInfo.pixelFormat = DEMO_BUFFER_PIXEL_FORMAT;
    fbInfo.width       = DEMO_BUFFER_WIDTH;
    fbInfo.height      = DEMO_BUFFER_HEIGHT;
    fbInfo.startX      = DEMO_BUFFER_START_X;
    fbInfo.startY      = DEMO_BUFFER_START_Y;
    fbInfo.strideBytes = DEMO_BUFFER_STRIDE_BYTE;
    g_dc.ops->setLayerConfig(&g_dc, 0, &fbInfo);
    g_dc.ops->setCallback(&g_dc, 0, DEMO_BufferSwitchOffCallback, &disp_drv);

#if defined(SDK_OS_FREE_RTOS)
    s_transferDone = xSemaphoreCreateBinary();
    if (NULL == s_transferDone)
    {
        PRINTF("Frame semaphore create failed\r\n");
        assert(0);
    }
#else
    s_transferDone = false;

    /* lvgl starts render in frame buffer 0, so show frame buffer 1 first. */
    g_dc.ops->setFrameBuffer(&g_dc, 0, (void *)s_frameBuffer[1]);

    /* Wait for frame buffer sent to display controller video memory. */
    if ((g_dc.ops->getProperty(&g_dc) & kDC_FB_ReserveFrameBuffer) == 0)
    {
#if defined(SDK_OS_FREE_RTOS)
        if (xSemaphoreTake(s_transferDone, portMAX_DELAY) != pdTRUE)
        {
            PRINTF("Wait semaphore error: s_transferDone\r\n");
            assert(0);
        }
#else
        while (false == s_transferDone)
        {
        }
#endif
    }

    g_dc.ops->enableLayer(&g_dc, 0);

    /*-----------------------------------
     * Register the display in LittlevGL
     *----------------------------------*/

    lv_disp_drv_init(&disp_drv); /*Basic initialization*/

    /*Set up the functions to access to your display*/

    /*Set the resolution of the display*/
    disp_drv.hor_res = LCD_WIDTH;
    disp_drv.ver_res = LCD_HEIGHT;

    /*Used to copy the buffer's content to the display*/
    disp_drv.flush_cb = DEMO_FlushDisplay;
    disp_drv.wait_cb = DEMO_WaitFlush;

#if (LV_USE_GPU_NXP_VG_LITE || LV_USE_GPU_NXP_PXP)
    disp_drv.clean_dcache_cb = DEMO_CleanInvalidateCache;
#endif

    /*Set a display buffer*/
    disp_drv.draw_buf = &disp_buf;

    disp_drv.full_refresh = 0;
    disp_drv.direct_mode = 1;

    /*Finally register the driver*/
    lv_disp_drv_register(&disp_drv);

#if LV_USE_GPU_NXP_VG_LITE
    if (vg_lite_init(64, 64) != VG_LITE_SUCCESS)
    {
        PRINTF("VGLite init error. STOP.");
        vg_lite_close();
        assert(0);
    }
#endif
}

static void DEMO_BufferSwitchOffCallback(void *param, void *switchOffBuffer)
{
    lv_disp_drv_t *disp_drv = (lv_disp_drv_t *)param;

    /* IMPORTANT!!!
     * Inform the graphics library that you are ready with the flushing*/
    lv_disp_flush_ready(disp_drv);

#if defined(SDK_OS_FREE_RTOS)
    BaseType_t taskAwake = pdFALSE;

    xSemaphoreGiveFromISR(s_transferDone, &taskAwake);
    portYIELD_FROM_ISR(taskAwake);
#else
    s_transferDone = true;
#endif
}

#if (LV_USE_GPU_NXP_VG_LITE || LV_USE_GPU_NXP_PXP)
static void DEMO_CleanInvalidateCache(lv_disp_drv_t *disp_drv)
{
#if __CORTEX_M == 4
    L1CACHE_CleanInvalidateSystemCache();
#else
    SCB_CleanInvalidateDCache();
#endif
}
#endif

static void DEMO_WaitFlush(lv_disp_drv_t *disp_drv)
{
    if (xSemaphoreTake(s_transferDone, portMAX_DELAY) != pdTRUE)
    {
        PRINTF("Display flush failed\r\n");
        assert(0);
    }
}

static void DEMO_UpdateDualBuffer( lv_disp_drv_t *disp_drv, const lv_area_t *area, lv_color_t *colour_p )
{

	lv_disp_t*	disp = _lv_refr_get_disp_refreshing();
	lv_coord_t	y, hres = lv_disp_get_hor_res(disp);
    uint16_t	a;
    lv_color_t	*buf_cpy;

    if( colour_p == disp_drv->draw_buf->buf1)
        buf_cpy = disp_drv->draw_buf->buf2;
    else
        buf_cpy = disp_drv->draw_buf->buf1;

    for(a = 0; a < disp->inv_p; a++) {
    	if(disp->inv_area_joined[a]) continue;  /* Only copy areas which aren't part of another area */
        lv_coord_t w = lv_area_get_width(&disp->inv_areas[a]);
        for(y = disp->inv_areas[a].y1; y <= disp->inv_areas[a].y2 && y < disp_drv->ver_res; y++) {
            memcpy(buf_cpy+(y * hres + disp->inv_areas[a].x1), colour_p+(y * hres + disp->inv_areas[a].x1), w * sizeof(lv_color_t));
        }
    }
}

static void DEMO_FlushDisplay(lv_disp_drv_t *disp_drv, const lv_area_t *area, lv_color_t *color_p)
{
    /*
     * Before new frame flushing, clear previous frame flush done status.
     */
    (void)xSemaphoreTake(s_transferDone, 0);

/*CHANGE 4*/
	if( lv_disp_flush_is_last( disp_drv ) ) {
		DCACHE_CleanInvalidateByRange((uint32_t)color_p, DEMO_FB_SIZE);
		g_dc.ops->setFrameBuffer(&g_dc, 0, (void *)color_p);
		DEMO_UpdateDualBuffer(disp_drv, area, color_p);
	    //s_framePending = true;
	}
	else
		lv_disp_flush_ready( disp_drv );
}

without the lv_disp_flush_ready() call in change 4 the application locks.

in the following video the results, as you can see the rendering has many defects:
video (valid for 1 week)

What can be the cause?

thank you,
gianluca

Hi @gianlucacornacchia ,

It looks like you are using a slightly different CPU to the other thread can you confirm exactly which hardware/dev board you are using, or log into your NXP account and grab the link to share the SDK you created for your NXP IDE and post it so I can download it?

Kind Regards,

Pete