Changing arc value causes entire arc to be redrawn

Description

I’m using an arc in a digital gauge UI design and it seems even small changes in the value of the arc cause the entire area taken up by the arc to be redrawn even though I’m using direct render mode.

What MCU/Processor/Board and compiler are you using?

STM32F7

What LVGL version are you using?

9.2

Code to reproduce

Add a code snippet which can run in the simulator. It should contain only the relevant code that compiles without errors when separated from your main code base.

The code block(s) should be formatted like:

void create_screen_main() {
    lv_obj_t *obj = lv_obj_create(0);
    objects.main = obj;
    lv_obj_set_pos(obj, 0, 0);
    lv_obj_set_size(obj, 480, 480);
    lv_obj_set_style_bg_color(obj, lv_color_hex(0xff000000), LV_PART_MAIN | LV_STATE_DEFAULT);
    {
        lv_obj_t *parent_obj = obj;
        {
            lv_obj_t *obj = lv_arc_create(parent_obj);
            objects.obj0 = obj;
            lv_obj_set_pos(obj, 103, 104);
            lv_obj_set_size(obj, 274, 273);
            lv_arc_set_value(obj, 25);
            lv_arc_set_bg_end_angle(obj, 60);
        }
    }
}

int main() {
    lv_init();
    lv_tick_set_cb(HAL_GetTick);
    lv_display_t* display = lv_display_create(480, 480);
    lv_display_set_buffers(display, (void*) LVGL_GRAM_ADDR, NULL, 480*480*2, LV_DISP_RENDER_MODE_DIRECT);
    lv_display_set_flush_cb(display, my_flush_cb);
  ui_init();
  uint32_t tick = 0;
  uint16_t scale = 0;
   while(1) {
	  if (HAL_GetTick() > tick + 20) {
		  if (scale > 100) {
			  scale = 0;
		  }
		  lv_arc_set_value(objects.obj0, scale);
		  //lv_label_set_text(objects.obj11, num);
		  scale += 1;
		  tick = HAL_GetTick();
		  lv_timer_handler();
	  }
}
}

Screenshot and/or video

image

For debuggin purposes I’m using the simple UI as above. Even with this one when I debug my code and hit a breakpoint in the flush callback I can notice that the area being re-drawn is the entire area taken up by the arc (not just the arc itself) which is significantly affecting the refresh rate.

In direct render mode I would expect that only the new small blue section of the arc as it moves would be drawn but for some reason its redrawing the entire rectangular area of the arc and I don’t understand why. This is my flush callback function:

void my_flush_cb(lv_display_t* display, const lv_area_t* area, uint8_t* map) {

	if (!lv_display_flush_is_last(display)) {
		return;
	}
	HAL_LTDC_SetAddress(&hltdc, (uint32_t) map, 0);

	lv_display_flush_ready(display);
}

I’m using double buffering with LTDC so I’m just swapping the buffer start address of the LTDC peripheral. Whilst debugging I find that area->x1 = 90 area->x2 = 200, area->y1=90 and area->y2 = 200 when in reality only a small area of the area is actually changing. Any help would be greatly appreciated!

Thank you in advance.

修复(refr): lv_obj_invalidate_area使整个 OBJ 无效 by liamHowatt ·拉取请求 #7598 ·lvgl/lvgl — fix(refr): lv_obj_invalidate_area invalidates whole obj by liamHowatt · Pull Request #7598 · lvgl/lvgl

Thanks for the response. I tried making the same modifications as in that pull request but that actually stopped my code from working completely (my screen stopped updating all together).

It also seems that the master LVGL branch doesn’t have the changes from that PR

I’ve done some more debugging but still haven’t found an answer. When I step into the functions called from using lv_arc_set_value() it seems LVGL is invalidating the correct area however on the next flush call the area is not the same as what was invalidated.

If anyone could try this on their platform and see what area they get in the flush callback that would be much appreciated

I have also measured that lv_arc_set_value() takes very little time to exect however lv_timer_handler() takes on average 68ms to execute… not sure why :frowning:

Enabling LV_USE_REFR_DEBUG also showed that the entire arc area is being refreshed.

For some extra info I’m using external SDRAM of 2MiB at 108 MHz to store the two frame buffers. I’ve measured the transfer rate into SDRAM is about 17MB/s and takes about 18ms to write an entire frames worth of memory into it (about 500 KiB). I thought that this hardware setup would be more than capable to achieve at least 30fps even with full render mode. But it just seems that lv_timer_handler() is slowing things down?

Does the display IC have GRAM? what is the connection or “bus” that is being used? Is it an I8080, RGB, SPI, etc…

If it is an RGB display it is going to be slow. That’s because the entire framebuffer needs to be rendered and when using 2 framebuffers they need to be kept on sync so one has to get copied to the other. There is only so much bandwidth to the memory that the frame buffers are being stored in and that is going to slow things down because when copying from one buffer to the next it is going to be using the same ram that is being used while transmitting.

If you want better performance get a display that has a 16 lane I8080 bus. This display is going to have GRAM which will allow you to use a much smaller buffer and you will not need to keep the buffers in sync.

It’s an RBG565 interface with an ST7701 controller which has no internal GRAM so both buffers are in the external SRAM. I’ve improved the performance considerably now but adding O2 optimisation to gcc so it’s all sorted now thanks for the help everyone.

can you explain a bit more what you did that made it work faster? Is it only specific to the arc widget? I also use an RGB565 with ST7701 and occasionally it is very slow (especially when scrolling, but if i change color of the whole screen, it works quite fast…).

When dealing with any display the entire display needs to be refreshed/redrawn all the time with displays that are “RGB” displays that task falls onto the MCU that is connected to it. Some displays have an IC that sits between the MCU and the display and that IC in a lot of cases will have onboard memory and takes on the responsibility of keeping the display updated. When it is the responsibility of the MCU it is a never ending loop that continually feeds the display with data.With the ESP32 this loop is done in the background using DMA transfers. However only so much data is able to be sent in a single transaction so an ISR occurs in order for the CPU to create the next transaction. Because the data needs to be continually fed to the display the frame buffer needs to be width * height * color bytes per pixel in size. 2 of those sized buffers needs to exist because you cannot render to a buffer that is transmitting, it would corrupt the data. so ad the data is being transmitted by one buffer the second is being rendered to. In LVGL it checks and see what pieces need to be updated and it updates them. it is not going to identify small pieces of a widget that need to be updated it is going to update the entire widget. If it tried to keep track of all of the pieces of a widget that would consume way too many resources to deal with.

Once the frame buffer has rendered the flush function gets called and behind the scenes what is done is the pointer to the frame is swapped. We have a problem now. The not idle buffer that is going to be rendered to doesn’t contain the data that was just written to the buffer that is transmitting. So what takes place is the buffer that is now transmitting ends up getting copied to the buffer that is now idle.

All of the above work takes place on a single core of the ESP32. That’s a lot of work to be doing for only a single core and it can really get bogged down with high pixel count displays like an 800 x 480 display. That is a lot of data it has to deal with. The problem gets further compounded by the frame buffers only being able to fit in PSRAM. PSRAM operates at a fraction of the speed that internal memory does. The bandwidth to the ram also gets shared so when you are transmitting data from the memory and rendering to the memory the bandwidth gets divided by 2. LVGL is only able to do one thing at a time so while it is copying from one buffer to the other no rendering is taking place. That is another huge performance hit that happens.

I wrote some code that handles this whole dance that occurs a little bit of a different way. It comes at a cost of increased memory use but it is well worth it in order to get the speed boost.

This is the setup…

I used Freertos to create a task on core 1. This is the task that the main application runs on, this includes LVGL.

I then created a second task that runs on core 0. This task has one sole purpose and that is to handle the copying of the buffer data and to manage the buffer that is being sent to the display.

The initialization of the rgb driver takes place in that second task and only that task is able to do anyuthing with the 2 full sized frame buffers. I created 2 small frame buffers that are handed to LVGL. I set the render mode in LVGL to partial so now it has the ability to only update the things that have changed. When the flush function gets called the buffer gets passed from the app task to the other task. The app task is now free to render to the second partial buffer and while that is taking place the second task handles copying the data from the partial buffer to the idle full size buffer. LVGL has a marker that is able to be checked to see if a series of updates has completed. If it has that is when that second task tells the RGB driver to swap the buffer being transmitted and then the task copied the data while it is transmitting from the now active buffer to the now idle buffer.

Currently the code is not written so it is able to simply be dropped into your program and used. I am working on making that part of the code so it could be added to a different application without having to jump through hoops to make it work. It is going to be a a little while longer until I am finished with it and when I am I will more than likely create an IDF component for it so others will be able to use it.

I simply added the -O2 (optimise more) flag to gcc when compiling my project, it removed about 100 kB off the flash usage. You could even try -O3 (optimise most) to get perhaps even better performance.

What I described is above and beyond what compiler optimization is going to be able to do.