Different strategy of rendering and flush stages that could benefit serial displays use cases

Hello community,

I’ve found at least one common use case where the current rendering and flush strategy is creating an overall performance penalty in terms of framerate.

Assume:

  1. You’re using a mid-low range microcontroller with support for average/slow serial display interface
  2. You’re using a serial based display controller, where serial interface may be single/dual/quad SPI, MIPI-DBI (SPI, 6800, 8080) with dedicated onboard framebuffer RAM. This controller supports TE signal output to allow frame synchronization for avoiding tearing
  3. Requirement of the project is to enhance user experience as much as possible, thus frame tearing must be avoided
  4. The firmware project is RTOS based with preemptive scheduling
  5. A custom TE sync stage sits after the rendering stage but before the display drive flush stage. This stage is able to calculate the optimal edge and delay to avoid tearing of a specific frambuffer area.

In this situation the refresh pipeline gets intrinsically synchronized with the display TE signal, thus with its panel scan rate.

The benefit of this strategy is mostly reduced or null frame tearing (depending on display bus bandwidth and display scan rate).

The cons, when this strategy is paired with current LVGL v9 are:

  1. A refresh operation split in multiple sub-areas will drastically reduce framerate, cause each sub-area must be synched with the TE signal thus introducing a high amount of latency for each iteration
  2. To avoid loss of framerate it’s possible to synch only the first sub-area of a refresh operation and execute the flush of the remaining sub-areas immediately after. But this strategy cannot guarantee that each sub-area will be tearing free, as it could fall outside the “no-tearing window”

From the point of view of tearing optimization the locally optimal solution would be to ever flush only a single sub-are in a refresh operation. Sub-area = merge of all dirty areas at once.

But this approach is not globally optimal due to reduced rendering efficiency: more pixels have to be rendered even though they were not part of dirty areas.

All of this also becomes even worse when dealing with partial framebuffers I think.

Solutions that come to my mind for now are:

Change join algorithm
Give an option to the user to make lv_refr_join_area() merge all dirty areas in up to 1 sub-area for each partial buffer so that framerate penalty can be lower.
If CPU is faster than flush rate this can be an acceptable solution.

Rework input data for TE sync stage

  1. Rendering stage uses current lv_refr_join_area() join alogrithm as always.
  2. Each sub-area needs to be rendered in sequence with no flush operation, thus the framebuffer must be able to store the sum of the sub-area sizes.
  3. Each sub-area is packed in a list and passed to the TE sync stage. An element of the list should contain: x,y,w,h of the area, pointer to the sub-area in framebuffer, size in bytes of the sub-area content in framebuffer.
  4. The list could be accompanied with the x,y,w,h values of the overall rectangle containing all the sub-areas.

This additional data passed to the flush stage allow the TE sync stage to calculate exactly how to sync the provided set of sub-areas, then proceed to flush them in temporal sequence, but with an order defined by the TE sync stage.

Hi,

I see the problem but something is not clear to me.

Let’s see these scenarios as examples:

  1. If the whole screen is updated you need a screen sized buffer to render and flush it at once
  2. If two labels are updates at the opposite corners of the screen the whole will be updates if you combine the aras.

It seems you need a screen sized buffer in both cases. If so you can use DIRECT rendering mode and when the last area is rendered too, you can flush all areas from the buffer.

If it can work for you I can show an example for that.

Hello @kisvegabor

My specific use case requires usage of partial buffers due to memory constraints.

Let’s assume 50% sized framebuffers:

1. If the whole screen is updated

The refresh will be performed in 2 steps and there’s not much we can do in this case probably. With double buffering we can flush first portion of screen while rendering the other.
Each screen portion flush will be synchronized with TE signal without taking into account the other.

2. If two labels are updated at the opposite corners

If the total refresh area size is less than size of a partial buffer, we can render both on the same buffer at sequential addresses

buf0
begin _________________________ end
|________________________________|
[Area A][Area B]---------------------------------

After the rendering both areas are passed to the TE sync stage to adjust TE sync strategy and then flush them in sequence

Got it.
What if we added a new display event called LV_EVENT_RENDER_AREA_START.
It’d be called when LVGL already knowa which are to render but didn’t do anything with it yet. In this event you could manually change the size and pointer of the buffer LVGL will render. As you know the area that will be rendered rendered (and the memory size it needs) you can also set the offset of the real buffer. You can just skip flushing except:

  1. The next are can’t fit to the buffer
  2. It was the last rendered area

What do you think?

Hi,

Yes this may be a good solution.

Step 1
The TE sync stage subscribes to LV_EVENT_RENDER_AREA_START.
It will accumulate data of the new sub-area every time LV_EVENT_RENDER_AREA_START is published.

Step 2
After the event has been handled, rendering starts using the address updated at step 1

Step 3
draw_buf_flush() is called conditionally only if:

  1. This is the last area

Looping step 1, step 2 and step 3 until all dirty areas are rendered

Questions:

  1. Can I pass the sub-area struct pointer in the event user data?
  2. What’s the best way of setting the framebuffer address at which the sub-area rendering will write to?

Hello @kisvegabor

For now I’m trying to create a POC, not necessarily the best solution, just to test the concept.
I want refr_area to render at a specific offset from the draw-buffer, is it enough to change the value of display->buf_act.data before refr_area is called?

Awesome!

You also need to change the data_size so that lvgl will know how many pixel it can fit into the buffer.

Hi @kisvegabor

In lv_refr.c - > refr_area()

/*Try to divide the area to smaller tiles*/
...

Could you clarify or provide external info about why was this implemented?

Cause we already have a phase in which we divide dirty-areas into sub-areas, from my understanding of it it’s needed in case dirty-area is larger than partial buffer size

In lv_refr.c - > refr_invalid_areas()

        if(disp_refr->render_mode == LV_DISPLAY_RENDER_MODE_PARTIAL) {
            /*Calculate the max row num*/
            int32_t w = lv_area_get_width(&inv_a);
            int32_t h = lv_area_get_height(&inv_a);

            int32_t max_row = get_max_row(disp_refr, w, h);

            int32_t row;
            int32_t row_last = 0;
            lv_area_t sub_area;
            sub_area.x1 = inv_a.x1;
            sub_area.x2 = inv_a.x2;

            for(row = inv_a.y1; row + max_row - 1 <= inv_a.y2; row += max_row) {
                ...

Thanks!