FreeRTOS + lvgl - performance problem

Hi folks,
I’m working on a project using LVGL (9.2.2) on an ATSAME54 microcontroller. My setup includes a 320x240 pixel display with an ST7789V driver (I’m using the driver from lvgl), which communicates via SPI using DMA (baudrate set to 18MHz, even though that is much more than ST7789V accepts). The system runs FreeRTOS, and I’ve integrated LVGL as follows:

  1. lv_tick_inc is called in vApplicationTickHook.
  2. There are two FreeRTOS tasks:
  • LVGL_Task: Responsible for running lv_timer_handler (file lvgl.c)
  • ChartDisplay_Task: Handles rendering audio bars (current power + moving average), files display_ui.c and display_init.c.

display_init.c

lv_color_t lcdChartBuffer[LCD_H_RES * LCD_V_RES / 10];
lv_color_t lcdChartBuffer1[LCD_H_RES * LCD_V_RES / 10];

void ChartDisplay_Init(void)
{
    xTaskCreate(ChartDisplay_Task, "chart", 700, NULL, tskIDLE_PRIORITY + 2, NULL);
}

static void ChartDisplay_Task(void *p)
{
    (void)p;

    int32_t result;
    TickType_t xLastWakeTime;

    /* Initialize LCD I/O */
    result = LCD_Init();
    if (result != ERR_NONE) {
        vTaskDelete(NULL);
        return;
    }

    LVGL_Lock();
    /* Create the LVGL display object and the LCD display driver */
    lcdChartDisplay = lv_st7789_create(LCD_V_RES, LCD_H_RES, LV_LCD_FLAG_NONE, LCD_SendCmd, LCD_SendColor);

    LCD_ChangeOrientation(lcdChartDisplay);
    lv_lcd_generic_mipi_set_invert(lcdChartDisplay, true);
    lv_display_set_buffers(lcdChartDisplay, lcdChartBuffer, lcdChartBuffer1, sizeof(lcdChartBuffer), LV_DISPLAY_RENDER_MODE_PARTIAL);

    ChartDisplay_InitUi(lcdChartDisplay);
    LVGL_Unlock();

    xLastWakeTime = xTaskGetTickCount();
    while (true) {

        ChartDisplay_Update();
        xTaskDelayUntil(&xLastWakeTime, pdMS_TO_TICKS(20));
    }
}

display_ui.c

void ChartDisplay_Update(void)
{
    LVGL_Lock();
    display_demo();
    LVGL_Unlock();
}

static void display_demo(void)
{
    // ...
    // logic which updates `var` and `average` values...
    // ...

    for (int i = 0; i < 8; i++) {
        ChartDisplay_SetCurrentValue(i, var);
        ChartDisplay_SetAverageValue(i, average);
    }
}

void ChartDisplay_SetCurrentValue(uint8_t audio_channel, int32_t value)
{
    lv_obj_t *bar;

    if (audio_channel >= AUDIO_CHANNELS_COUNT) {
        return;
    }

    if (value > AUDIO_POWER_MAX_DB) {
        value = AUDIO_POWER_MAX_DB;
    } else if (value < AUDIO_POWER_MIN_DB) {
        value = AUDIO_POWER_MIN_DB;
    }

    bar = audio_channel_bar[audio_channel];

    lv_bar_set_value(bar, value, LV_ANIM_ON);
}

static void ChartDisplay_SetAverageValue(uint8_t audio_channel, int32_t value)
{
    lv_obj_t *average_line;
    lv_coord_t y_max;
    lv_coord_t y_min;
    lv_coord_t y_offset;

    if (audio_channel >= AUDIO_CHANNELS_COUNT) {
        return;
    }

    if (value > AUDIO_POWER_MAX_DB) {
        value = AUDIO_POWER_MAX_DB;
    } else if (value < AUDIO_POWER_MIN_DB) {
        value = AUDIO_POWER_MIN_DB;
    }

    average_line = audio_average_line_array[audio_channel].obj;
    y_max = audio_average_line_array[audio_channel].y_max;
    y_min = audio_average_line_array[audio_channel].y_min;

    /* Calculate the 'y' offset of audio movinin average for a given channel */
    y_offset = ((y_max - y_min) * value / (AUDIO_POWER_MAX_DB - AUDIO_POWER_MIN_DB)) + (y_max - (((y_max - y_min) * AUDIO_POWER_MAX_DB) / (AUDIO_POWER_MAX_DB - AUDIO_POWER_MIN_DB)));

    lv_obj_align(average_line, LV_ALIGN_BOTTOM_LEFT, 0, y_offset + 3);
}

lvgl.c

void LVGL_Init(void)
{
    xTaskCreate(LVGL_Task, "lvgl", 2048, NULL, tskIDLE_PRIORITY + 2, NULL);
}

static void LVGL_Task(void *pvParameters)
{
    (void)pvParameters;

    LVGL_Lock();
    lv_init();
    lv_tick_set_cb(xTaskGetTickCount);
    LVGL_Unlock();

    while (true) {
        /* The task running lv_timer_handler should have lower priority than that running `lv_tick_inc` */
        LVGL_Lock();
        lv_timer_handler();
        LVGL_Unlock();
        vTaskDelay(pdMS_TO_TICKS(20));
    }
}

How it works now?

display-video-ezgif.com-rotate

Could you recommend what I can do to achieve satisfying refreshing rate. Currently this is not acceptable, especially that’s one of key functionality. Thanks in advance!

from what I am able to see is that you are not updating the tick correctly. The timer_handler task should have the highest priority. and the SPI speed is really low, It should be upwards of 80Mhz. You frame buffer sizes are too small and I don’t know what color depth LVGL is using, I am guessing it is using RGB888. There should be 2 tasks, one to handle the tick and another to update the widgets and call the task_handler. no locks are required for this kind of a design.
you code doesn’t show how the values the meters are using are collected so without seeing that it is hard to determine what could possibly be causing the not so smooth updates.

Thanks @kdschlosser for your answer. Below I tried to address your thoughts/questions.

will try to change it and see if there is any difference.

This is one of the bottlenecks - the maximum speed of SPI interface in the MCU I use is 18MHz (according to ST7789V driver, the maximum allowed SPI speed is 62.5MHz).

I’m using the default one: RGB565. Will try with bigger buffers (but I haven’t noticed a difference between one or two buffers).

For now it’s only a demo (in future the MCU will read data from DSP through SPI and draw these bars depending on received values) - something like that:

static void display_demo(void)
{
    #define SAMPLES_COUNT 10
    static int array[SAMPLES_COUNT];
    static int var = 0;
    static int average = 0;
    static bool direction = true;
    static int count3 = 0;
    static uint32_t count2 = 0;
    static uint32_t count;

    if (count % 3 == 0) {
        // display_chart_set_current_value(0, 0);

        if (count3 > 0) {
            count3--;
        }

        if (count3 == 0) {

            if (direction) {
                var += 2;
            } else {
                var -= 2;
            }

        }

        if (var > 15) {
            direction = false;
            var = 15;
            count3 = 20;
        }

        if (var < -30) {
            direction = true;
            var = -30;
            count3 = 20;
        }

        array[count2 % SAMPLES_COUNT] = var;
        for (int i = 0; i < SAMPLES_COUNT; i++) {
            average += array[i];
        }
        average /= SAMPLES_COUNT;
        ChartDisplay_SetCurrentValue(0, var);
        ChartDisplay_SetAverageValue(0, average);

        average = 0;
        count2++;
    }

    count++;
}

IDK what the speed is for the MCU you are using but considering the maximum that the SPI is able to transfer at I am going to guess it is a bit on the slow side. I am not sure the reasoning behind the MCU you are using but seeing that decently fast MCU’s can be had for 3 dollars USD or so I cannot imagine that the cost would be a factor. It may,m who knows.

LVGL’s default timers are set at 33 milliseconds for refreshing the display. Typically I would say to lower that number but I think that you may end up with an issue because of how much of the CPU LVGL will consume. The type of software you are writing ideally should be done using a dual core MCU like the ESP32 where 2 tasks can actually run in parallel to each other. You have one core that handles reading the DSP while the other core is taking care of the LVGL related stuff. The ESP32-S3 can be bought with up to 8MB of ram and 32mb of flash (program) storage. It has a decent number of GPIO pins at something around 40 that is usable by the user. 2 user usable Hardware SPI busses that can run as high as 80mhz. With what you are trying to do I recommend using a 16 lane I8080 display. This is going to be 16 times faster than an SPI display running at 80mhz. The ESP32 also has more than enough DMA memory so the transmits of data to the display are not blocking.

If you are not seeing any performance difference when you run single buffer VS dual then you are not using DMA memory for the SPI transfers. You should see a difference if the data is being sent that way. Typically it will be about 30% faster.

Currently you are using 2 buffers and each buffer is 320 * 240 / 10 in size. If you had a full frame buffer the size would be 320 * 240 * 2 in size. the 2 is because there are 2 bytes for each pixel.

The best size we have found to use for a partial is 10% of a full frame buffer 320 * 240 * 2 / 10 you are running a frame buffer that is 1/20th the size of a full buffer.

You are trying to achieve a smoothness something along these lines correct?