LVGL on FBDEV Linux - Soft cap at 30FPS?

Hi,

New LVGL user here.
I have been doing some performance tests of the library to assess suitability for a project.

Target hardware is a ARM Linux Module, A7 Quad Core 1.2GHz, 24bit RGB 720x720 Display, 512mb RAM.

To start I built my base buildroot image (Linux 6.6.30) along with the SDK and built the LVGL Benchmark application: Quick Setup - LVGL 9.4 documentation

I configured to use FBDEV as this seems to be the most performant and most developed display driver in LVGL - I wanted to try/compare DRM direct, however there are build/linker issues that I need to resolve for this to work.

After some tweaking of parameters, CPU resources etc, I was able to successfully run the benchmark:

# lvgl-app
Benchmark Summary (9.3.0 dev)
Name, Avg. CPU, Avg. FPS, Avg. time, render time, flush time
Empty screen, 16%, 23, 6, 3, 3
Moving wallpaper, 78%, 28, 29, 26, 3
Single rectangle, 24%, 26, 7, 4, 3
Multiple rectangles, 27%, 28, 9, 6, 3
Multiple RGB images, 57%, 26, 21, 18, 3
Multiple ARGB images, 58%, 27, 20, 17, 3
Rotated ARGB images, 96%, 10, 93, 90, 3
Multiple labels, 39%, 26, 11, 8, 3
Screen sized text, 11%, 27, 30, 27, 3
Multiple arcs, 26%, 27, 9, 6, 3
Containers, 55%, 27, 19, 16, 3
Containers with overlay, 96%, 17, 55, 52, 3
Containers with opa, 99%, 26, 36, 33, 3
Containers with opa_layer, 100%, 21, 43, 40, 3
Containers with scrolling, 74%, 27, 25, 22, 3
Widgets demo, 93%, 27, 27, 24, 3
All scenes avg.,59%, 24, 27, 24, 3

As you can see, where the single CPU thread is taxed the frame redraw time, and the frame flush time, start to pull down the FPS (as expected). It’s clear in some drawing operations I am CPU Limited (software alpha blending - as expected).

However, there are several tests where the frame draw and frame flush should result in SIGNIFICANTLY higher FPS - e.g. Multiple Arcs has a total processing time of 10ms, so should result in a FPS of ~100.

Just a sidebar here, I also enabled the on-screen perfmon:

    lv_display_t * disp = lv_display_get_default();
    lv_sysmon_show_performance(disp);

For these tests, it show more or less a steady 30fps, so there is some variance here in the reported benchmark results.

Regardless, its clear there is some artificial cap on the FPS - for which I cannot determine.

The hardware is clearly capable - as:

  • The display/hardware RGB Pixel Clock is sets refresh of 60Hz
  • Running a different test (e.g. DirectFB2 df_andi) results in a flat 60FPS rate - basically capped a the hardware capability limit of the timing controller.

Things I have tried:

in lv_conf.h

/*Driver for /dev/fb*/
#define LV_USE_LINUX_FBDEV      1
#if LV_USE_LINUX_FBDEV
    #define LV_LINUX_FBDEV_BSD           0
    #define LV_LINUX_FBDEV_RENDER_MODE   LV_DISPLAY_RENDER_MODE_FULL
    #define LV_LINUX_FBDEV_BUFFER_COUNT  1
    #define LV_LINUX_FBDEV_BUFFER_SIZE   518400
#endif

Changing buffer size, count and render mode (various options) - no appreciable result change.
My initial thought is that two buffers were interacting in some strange bug, but that doesn’t seem to make any difference if its 1 or 2.

in the core application (demo_app.c):

either:

v_task_handler();
usleep(1000);

or

lv_timer_handler_run_in_period(5);

changing the sleep or period in either of these functions also makes no difference, and these are now significantly quicker than the lowest draw+flush time on the demo as run.

It seems to me there is some other sort of interaction here which is capping performance. However, I cannot figure it out.

I have seen some other topics where people with “similar” hardware/cpu combos have achieved FPS > 30.

I can’t see/find any other user configurable areas which might impact this, hoping there is someone with experience that might be able to help me here.

some update:

Benchmark Summary (9.3.0 dev)
Name, Avg. CPU, Avg. FPS, Avg. time, render time, flush time
Empty screen, 36%, 51, 6, 3, 3
Moving wallpaper, 93%, 33, 29, 26, 3
Single rectangle, 46%, 58, 7, 4, 3
Multiple rectangles, 49%, 50, 9, 6, 3
Multiple RGB images, 96%, 42, 21, 18, 3
Multiple ARGB images, 100%, 44, 20, 17, 3
Rotated ARGB images, 100%, 10, 95, 92, 3
Multiple labels, 76%, 56, 11, 8, 3
Screen sized text, 13%, 47, 30, 27, 3
Multiple arcs, 59%, 57, 10, 7, 3
Containers, 93%, 47, 19, 16, 3
Containers with overlay, 99%, 17, 55, 52, 3
Containers with opa, 100%, 26, 36, 33, 3
Containers with opa_layer, 100%, 21, 44, 41, 3
Containers with scrolling, 99%, 37, 25, 22, 3
Widgets demo, 99%, 28, 27, 24, 3
All scenes avg.,78%, 39, 27, 24, 3

Decent performance increase by increasing the following in lv_conf.h

/*Default display refresh, input device read and animation step period.*/
#define LV_DEF_REFR_PERIOD  16 //33      /*[ms]*/

Next question is, what other optimisations exist to further improve performance, although based on the above results I am now CPU limited in all the tests.

Thanks

using double buffering with DMA memory will greatly improve performance.

double buffering e.g.:

#define LV_LINUX_FBDEV_BUFFER_COUNT 2?

1>2 made no difference, unless there needs to be additional steps in code to make value of this?

I believe in the current hardware, the memory write operations from the Frame Buffer memory area to the LCD controller are DMA. I’d need to dig more into the driver stack to verify.

Sorry I spaced cased there. You are running on Linux which means you are either using SDL or writing to the Linux frame buffer. In either case there is not much you can do to speed up the performance unless you write your own driver for it.

You are using a quad core processor, have you tried turning on the Linux OS or the pthread OS in LVGL?

primary start here 720x720 isnt your buff size and render mode full isnt optimal… Color mode setup in lv conf ?

-Colour mode is 16bpp - technically the display hardware is configured in 18bit mode, so the bandwidth saving here makes sense for very little change in visual quality. Although the dithering in gradients does become apparent.

-changing buffer size (LV_LINUX_FBDEV_BUFFER_SIZE) seemed to make no impact to performance between tests. The documentation I read seemed to indicate this was the ram allocation in Linux for page buffer, so in this case with 518400, it was a full display worth. Is this incorrect? Happy to try other recommended values and report back.

-Full resulted (LV_DISPLAY_RENDER_MODE_FULL) in the best quality results and reduced tearing (partial, or direct made no speed increases), and my assumption from above is it would push this full buffer to the fbdev device in a single operation.

OS config is PTHREAD:

/*=================
 * OPERATING SYSTEM
 *=================*/
/*Select an operating system to use. Possible options:
 * - LV_OS_NONE
 * - LV_OS_PTHREAD
 * - LV_OS_FREERTOS
 * - LV_OS_CMSIS_RTOS2
 * - LV_OS_RTTHREAD
 * - LV_OS_WINDOWS
 * - LV_OS_MQX
 * - LV_OS_CUSTOM */
#define LV_USE_OS   LV_OS_PTHREAD

OK so you are using the OS. have you instructed LVGL to use all 4 cores. I believe it defaults to only using 2 of the cores. I don’t remember off the top of my head how to do that maybe someone else known how and will chime in.

Im not expert for LV_LINUX but if buffer is in bytes , then your is only half size of required for 16bpp

An update for anyone following on:

#define LV_DEF_REFR_PERIOD 16 (default 33)

in lv_conf.h fixed the soft cap (30FPS > 60FPS-ish)

Also in my C compiler going to -O3 I saw a 3x draw speed increase in some functions (sw alpha blending for example).

compile optimizations will definitely make an impact in the speed that’s for sure.