Black flickering boxes intermittently appearing on display when scrolling

Hi all,

I have been trying to get to the bottom of an issue for a very long time. My hardware is the AMD Zynq Ultrascale+ MPSOC on which I have implemented a display port driver and use XHCI USB for the touch screen. I am running LVGL on a home grown SMP FreeRTOS port on the Application processor which has 2 x 1.2GHz A53 cores and two seperate iterations of FreeRTOS on the two R5 cores for some control processes.

The display interface uses double buffered(Full Screen sized buffers) direct mode with DMA and the hardware automatically changes DMA buffer addresses only during V-sync. I am also using a full HD 1080p with ARGB8888 configuration which I think might be a bit of a corner case.

Since I first bought up my code on the hardware just over a year ago I have an issue where I see black rectangles appearing when the display is being scrolled by dragging with the touch screen. I have tried implementing the driver in many ways and it always ends up with the same result, so I am now wondering if there may be some problem with the LVGL library.
I have attached a video of the problem, sorry it is not great quality as I have had to compress it a lot to make it uploadable on here. It is slightly worse than it shows on the video because the frame rate of my camera is not that great it doesn’t grab a true representation of the frequency of the rectangles, but I think it’s enough to see the issue. The rectangles are also always confined to the area of the screen which is actually scrolling only and not anywhere else.

If anyone has any ideas about how to go about debugging this issue I would most appreciative as I am pretty much out of ideas right now. I find I can’t reproduce the problem when stepping through the code for example, it’s all very difficult to deal with. Any suggestions please? :woozy_face:

Tabview_flicker

Thank you,

Kind Regards,

Pete

My guess:

It seems that lvgl Invalidated areas of scrooling areas in the screen to render this areas again, and, in the process, write black pixels on this rectangles.

should simply render the new image without writing black pixels on invalidated rectangle background areas.

Maybe some sync issues too.

Couple of questions…

What is the render mode set to in LVGL?

How are your buffers arranged, size etc…

From what it looks like is you are experiencing is what is called an image tare. This is typically caused by is a buffer getting written to at the same time to is being read from.

Where are you calling lv_display_flush_ready from? This should be done from the vsync callback but it should only be called a single time each time the buffer is replaced and you need to make sure that it is being called only after the new buffer ha been rendered. In order to get it to work correctly depending on how the callback is set up you may have to put in place a kind of a hack in order to test to see what buffer it is that just finished rendering. I had to do this kind of a thing with the RGB driver for the ESP32 MCU’s. That is because there is no way to know what buffer just got finished being sent to the display from inside of the callback. There is no public API to be able to do that. You might be facing the same kind of a setup.

The reason why you are only seeing is when you scroll is because of how much is being updated and how long it is taking LVGL to render. If it takes LVGL longer to render a section then it takes to send the data to the display that is when you will usually end up with an overlap. It took me a while to perfect a driver that would allow LVGL to render fast and to swap out the buffers properly.

Since you are running a processor that has multiple cores what I recommend you do is the following.

Here is a flow chart that shows running 2 tasks, 2 DMA full frame buffers, 2 partial frame buffers in internal ram, a couple of binary semaphores and a group event. The flow chart shows what needs to be done and when… This is only going to show you the loops in those tasks and also the VSYNC ISR callback.

I have code to copy partial buffer data to the full buffers if you need it. It also will do rotation at the same time and if you are using RGB565 dithering can be tossed in there as well.

The 2 full buffers still get created in DMA memory. The partial buffers do not, internal ram is going to be faster so I recommend placing the partial buffers in internal memory if they will fit. You want to size the partial buffers so that it takes no more than 2-3 partial buffers for an update run to be finished.

A frame is considered as a full buffer. When dealing with partial buffers it may take more than one partial buffer to update all of the data that would be a frame. To find out if the partial buffer that is being flushed is the last on in the run you call lv_display_flush_is_last When this returns true then you know you can swap the the full buffers. The trick here is because the full buffers need to contain ALL of the data being seen and when using partial partial buffers with LVGL only the sections that get updated are what gets sent to the flush callback what we need to do is copy the data from one full buffer to the other before we write any new data to it. We are allowed to access the same areas of memory if we are reading the data and that is not going to cause any corruption. So while a full buffer is transmitting using DMA memory we can also copy from that buffer to the other full buffer. We do this right after the buffers get swapped in the VSYNC ISR. we don’t need to worry about LVGL rendering to them because LVGL only knows about the partial buffers.

That copy of the full buffers is the only place that has the potential to cause a stall in task 0. If it does happen it would be for an exceedingly short amount of time. If you look at how the locks are done LVGL is able to deposit a buffer to be copied and then it goes about it’s merry way but if it circles back around to have the second partial buffer copied that is when it will gets stalled. So LVGL is able to render to 2 partial buffers before any stall would occur. If you have the partial buffers sized to small it could happen but the stall would only be for a very small amount of time.

primary this cant be auto, but based on full rendered next buff ready

Hi @Baldhead ,

Thanks for the comments, I think I agree with your statements about the rendering process and may try and see what is happening with the writing of black pixels… Although I have also considered some kind of memory/cache issue may be responsible for this also, the system has 8 GB of DDR4 RAM where all buffers reside. So this could be a source of an issue also.

Kind Regards,

Pete

Hi @kdschlosser ,

Thank you for your detailed response, you must have spent considerable time contemplating my dilemma, it is much appreciated. I think you have helped to cement my understanding of the way things operate. I’m using direct mode with 2 full size screen buffers, my system doesn’t have different types of memory etc. like an ESP32… It’s a multicore ARM 64-bit device with 8GBs of DDR4 RAM. Here are excerpts from my code:

Firstly my flush callback…

void dp_flush( lv_display_t *disp_drv, const lv_area_t *area, uint8_t *data ) {

	uint32_t	event = 0;
	if( lv_display_flush_is_last( disp_drv ) ) {  // Only update the buffer when it is the last part
		if( ( (uintptr_t)data == cpu0_globals->dp_data.dma_fb1.Address ) ) {
			cpu0_globals->dp_data.dma_next_buf = &cpu0_globals->dp_data.dma_fb1;
		} else {
			cpu0_globals->dp_data.dma_next_buf = &cpu0_globals->dp_data.dma_fb2;
		}
		if( cpu0_globals->dp_data.trained ) {    // Only bother if display port interface is trained up
			vPortEnableInterrupt(DPDMA_INTR_ID);	// Enable V-Synch interrupt 
			xTaskNotifyWait( pdTRUE, 0xFFFFFFFF, &event, portMAX_DELAY );	// Wait for V-Synch
			lv_display_flush_ready(disp_drv);
		} else	lv_display_flush_ready(disp_drv);
	} else lv_display_flush_ready(disp_drv);
}

Here’s the interrupt handler…

static void OTG_dma_irq_handler( XDpDma *InstancePtr ) {

	BaseType_t			xHigherPriorityTaskWoken = pdFALSE;
	uint32_t 			RegVal = XDpDma_ReadReg( InstancePtr->Config.BaseAddr, XDPDMA_ISR );

	if( ( RegVal & XDPDMA_ISR_VSYNC_INT_MASK ) != 0U ) {	// V-Sync Handling
		XDpDma_WriteReg( InstancePtr->Config.BaseAddr, XDPDMA_ISR, XDPDMA_ISR_VSYNC_INT_MASK );
		if( InstancePtr->Gfx.FrameBuffer != cpu0_globals->dp_data.dma_next_buf ) {	// We only switch buffers during V-sync 
			XDpDma_DisplayGfxFrameBuffer( InstancePtr, cpu0_globals->dp_data.dma_next_buf );
		}
		if( InstancePtr->Gfx.TriggerStatus == XDPDMA_RETRIGGER_EN) {  // Retrigger if running already
			XDpDma_SetupChannel(InstancePtr, GraphicsChan );
			XDpDma_ReTrigger(InstancePtr, GraphicsChan);
		} else if( InstancePtr->Gfx.TriggerStatus == XDPDMA_TRIGGER_EN ) {  // Setup and start if not running
			XDpDma_SetupChannel( InstancePtr, GraphicsChan );
			XDpDma_SetChannelState( InstancePtr, GraphicsChan, XDPDMA_ENABLE );
			XDpDma_Trigger( InstancePtr, GraphicsChan );
		}
		vPortDisableInterrupt( DPDMA_INTR_ID );	// Disable interrupt until flush enables again
		xTaskNotifyFromISR( cpu0_globals->gui.gui_task, HPD_V_SYNC, eSetBits, &xHigherPriorityTaskWoken );    // Unblock flush task
		portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
	}
}

Maybe this approach is wrong, could it be the buffer is being written whilst it is being transferred? The more I think about this as I have written it up maybe I should try a different approach…

Thinking about the code currently… LVGL renders the buffer it has chosen, it then calls the flush callback specifying the buffer which needs to be assigned to the DMA transfers, my code enables the V-sync interrupt from the flush callback and waits until the buffer has been selected and enabled, once the the interrupt has switched the buffer and kicked the flush callback the flush callback then calls lv_display_flush_ready(disp_drv);. Maybe on some occasions the flush callback is calling lv_display_flush_ready(disp_drv); and LVGL is switching buffer and rendering before the DMA has finished? (The CPU runs at 1.2GHz so it is pretty fast!) I think I need to waggle some hardware pins and try and get a better sense of the timing with an oscilloscope to see if things are somehow out of sequence here? What do you think?

Kind Regards,

Pete

Hi @Marian_M ,

Thank you for your reply, I have not stated the way the system works in enough detail, when I mentioned the hardware automatically changes the DMA buffer addresses only during V-Sync. The hardware has a DMA engine which continuously copies the contents of the screen buffers to the display at 30fps Full HD it uses descriptors to describe the buffers, there are two predefined descriptors in my configuration which describe the location and size of the two buffers which LVGL uses to draw to (I am using 2 full screen size buffers in direct mode). To switch the DMA buffer you tell the DMA engine which buffer to use by specifying the descriptor to use at any point in time and the DMA engine only switches the descriptor at the first time a V-Sync occurs after the request is made, so in theory there should be no screen tearing etc. I hope that better explains the way the system works.

Kind Regards,

Pete

The problem is you are calling flush ready from the flush callback function. The buffers are not staying synced. The other issue is the stall you have in the flush callback. This is causing a slow down in the code which may end up being realized as a split second of an area that has not been rendered.

You have enough memory to use the 4 buffer approach. You are going to move the responsibility of keeping the buffers in sync to a complete different core. By doing that you will improve the rendering speeds greatly. Right now LVGL is keeping the buffers in sync and that increases the amount of time it takes LVGL to render.

Write the code exactly like what is seen in the flow chart. and you will be able to use 4 buffers, 2 full buffers and 2 partial buffers. LVGL will be set to partial mode using the 2 partial buffers and you have the partials copied to the full buffer in a different task. The thing that ties it all together is the function you call to check if a partial buffer is the last if a frame update. If it is that is when you set the flag to have the buffers swapped out in the VSYNC ISR.

Below are links to how I went about doing this very same thing for the ESP32. The code is written for MicroPython but it will give you a really good idea of what I am doing.

One of the really important things that I have done in the code I wrote is I moved all of the driver initialization code to that second task. It may seem strange to do that but it’s actually not if you know the mechanics of the driver. The VSYNC ISR takes place at the end of each transmission of the buffer. This ISR happens on the core that the drivers were initialized from. Having that take place on the same core that LVGL is rendering on is going to do what? It is going to slow down the rendering. By moving that ISR to another core another performance gain is able to be had.

Here is the task function that handles copying the partial buffer to the full buffer and also handles the syncing of the 2 full buffers…

Here is the link to the flush callback. This function gets called from inside of the flush callback.
This function handles the acquiring and releasing of locks and setting the data needed to copy the partial to the full.

Here is the VSYNC ISR callback function. You can see how I am handling testing to see if the buffer actually needs to be swapped out. I don’t actually make the call to the driver to swap the buffers because I do not know if any memory allocation occurs in the driver so I let the copy task handle calling that function from outside the context of the ISR. I just stall the copy task until the VSYNC ISR occurs which lets the copy task know it is good to go to sync the full buffers.

There is a main lock that guards the copy task so it doesn’t end up spinning it’s wheels if there is no work that needs to be done. It also prevents the copy task from accessing data structures that the LVGL task might be populating. That guard gets released once the data has been populated. There is a second lock that gets acquired before the data gets populated. This lock only gets released once the copy task and finished collecting the last data that has been placed. While I do not believe that LVGL would be able to render faster than the copy task would be able to collect the data it is simply an assurance that it cannot happen. You could technically replace these locks with a queue and pass the data through the queue. I never set that up because of the additional complexity involved in doing that.

You have to take a leap of faith and trust me on this. I know it is going to be a large investment of time to make the changes needed. But I can tell you with 100% certainty that doing this WILL fix you issue and you will also get a performance boost as a side effect of doing it.

Hi @kdschlosser ,

Many thanks again for your time, I really appreciate your input. This is an interesting approach, the only downside I see is the rendering of the buffer followed by an extra memory copy, but the net result might still be better than what is going on right now… I will definitely take a serious look at your proposal and explore its feasibility and possible implementation with in my system, I have FreeRTOS running so task sync, mutual exclusion, queues etc. are all easy to achieve.
I am also wondering if there is an opportunity to change the way LVGL handles flush etc. to find a better way to enable the creation of more efficient drivers… more hooks or better control over when and where things happen especially in the multi-core department. I think since my early adoption of LVGL back in the days of Version 6(Note I have updated my projects to Version 9 now and I use the master branch for my ongoing development to keep the latest projects bang up to date), there has been the introduction of LVGL events generated at the various stages of rendering and flushing which I may be able to use as hooks to alter the timing/triggering of everything. I am also aware you can delete the rendering timer and call the rendering function manually from your own code but I haven’t tried experimenting with that much yet either. I will probably take a look at this approach also.

I will keep you posted on how things go…

Kindest Regards,

Pete :smile:

It’s faster trust me. It’s faster because you are able to move the VSYNC to a core that LVGL is not running on. LVGL doesn’t need to keep the buffers in sync at all. You don’t need to render the entire screen each time an update occurs you are able to only render the small area that updated. and the most important thing is the buffers are kept synchronized properly so you don’t end up with the issue you are having.

There are some added benefits to it as well. You can add rotation with only a very minimal amount of impact to performance. If you are using RGB565 you can also add in dithering as well with once again next to no impact in performance. This i because those things would be able to be done at the same time that the partial buffer is being copied to a full buffer.

1 Like