Best MCU for LVGL? Would the new Teensy 4.1 be enough?

Hello all, I’m brand new to LVGL (just found out about it today), and I’m excited at it’s capabilities and flexibility. I was wondering what is considered the best MCU to power a ~5", 800x480 display? Has anybody done any bench markings with side by side comparisons?

For my needs, the speediest most responsive UI is the most important aspect of my project’s user experience. I’d like to see smooth animations, smooth full screen text scrolling, opacity, and constant updating of multiple text\label objects. I’m looking for an Arduino compatible MCU that will be responsible for driving a 5" TFT touch screen. All of my IO and peripherals will be controlled by a lower cost, slower MCU (Arduino nano or mini or something similar like a custom board design) and connected by serial to the display’s MCU (similar to how a Nextion display works).

With the recent announcement of the Teensy 4.1 (a Cortex-M7 processor at 600 MHz) Seems like more then enough power for a display. Is it too much power? Would a less expensive MCU be more comparable and would users even see a difference between the two? Would a Teensy 4.1, with all of it’s IO and speed be enough to smoothly drive a display AND manage peripheral devices, foregoing my second peripheral MCU? I’m most familiar with ESP32, SAMD51 and standard Atmel/Arduino boards. So working with the Teensy would be new for me.

At this point, I’m not concerned with the cost of the MCU, I just want to know what Arduino compatible MCU would offer the smoothest experience. Any advice and guidance would be much appreciated.

Thanks,
Jason

PS: I’m more of a Technologist/Product Designer then I am a graphics engineer. So I may get lost in some of the “math” talk that this topic may produce. :stuck_out_tongue:

Hi, Jason.
I’m trying to use lvgl on an Atmel SAMA5D31 498MHz with a screen of 800x480x16 and so far I can’t say that the performance result is good. On a synthetic benchmark, an average of 20-32 fps is obtained. Speed ​​is highly dependent on the use of blending. I strongly suspect that the bottleneck is DDR2 SDRAM. There is evidence that on the Cortex M7 platform, with a frequency less than 2.5 times, the performance is many times more. But even in this case, blending greatly reduces speed. As far as I understand, IMXRT1062 (Teensy 4.1) does not contain a blending accelerator, so the general trend will be the same.

1 Like

Thanks for this detailed information.
To be honest, I’m unfamiliar with a “blending accelerator”. Is it common on faster or more modern MCUs? Is is something I should look for in a MCU to drive LVGL? If so, can you suggest a few MCU’s that might have this (if you know of any)?

Would a better blending solution be the same option I would look for for faster page scrolling or full screen redraws?

Again, thanks for taking the time to educate me and share your discoveries with me.

For example, it could be STM32 with Chrome-Art technology. This is a hardware module that allows, without using a processor, to mix colors for different color modes (RGB888, RGB565, ARGB). In LVGL, when the screen is refreshed, only the changed elements are redrawn, but for example, when the whole screen is swiped, or the pop-up dialog is collapsed, all the content will be redrawn.

Well, Teensy 4.1 is really fast compared to many embedded CPUS! We talk 600Mhz ARM. Issue with Teensy 4.1 is the genuine LCD interface cannot be used, but the way to go would be DMA/SPI. with an average 40Mhz bit clock on SPI, you make the math of number of pixels * bit deph, and you will get the theoritical max frame rate.
On ESP32, they get really nice results on 320x240, and I will attempt 480*320.

Unfortunately, performance drops quadratically, depending on the size of the screen. With a 480x272x16 screen, at91sam9g45 (ARM9 400MHz) works great for me.

A powerful CPU can compensate for the lack of a blending accelerator, but the display interface also needs to be fast, otherwise it will limit the frame rate.

The complexity of the UI matters as well. Are you blending several translucent images together, or just scrolling a list of text and symbols?

As a real-world example, I tinker with LittlevGL on a 200MHz STM32F746 board with a 480x272 screen. The CPU has a blending accelerator, but I don’t use it. I run it with a buffer 1/4 the size of the display’s actual framebuffer, so that would be 65K in size. At this moment, I’m running the widgets demo on the site. Even with the processor cache disabled, frame rates are very reasonable, and when I enable cache, the UI is always buttery smooth. From the sounds of it, Teensy is roughly 3x more powerful than my current setup with regards to processing power. What would your UI look like in comparison to this?

If someone did end up getting one of these, it would be great if they could run this benchmark and share their results and hardware configuration.

1 Like

I just ordered a Teensy 4.1 and it should arrive in a week. Once I get it up and running, if I do, I’ll report back my experience. I’ll also try to run the benchmark on it.

Thanks

Hi Jason.
Which LCD panel are you going to connect to this board? There is nothing suitable on this board except SPI. The processor itself has the necessary interfaces, but they are not displayed on the board. But the panels with the SPI interface, with a resolution of 800x480 do not exist, in my opinion.

I assume it’s because refreshing them at full speed using the relatively slow SPI interface is near-impossible, although I’m no display interface expert. :slightly_smiling_face:

I mean, there are smart displays with SPI and a screen buffer that end in 320x240 resolution. Just not very successfully hinted to Jason that this particular module is not suitable, even though the controller is good.

@embig71
I think you lose most the performance due to VSYNC. To mitigate this triple buffering can be used. Otherwise you spend time by waiting for VSYNC. The other issue is that you render directly into external SRAM when random writes are very slow compared to the internal SRAM.

I added a new comment in the related topic to figure out what causes the issue.

@kisvegabor, are there any examples I can reference that show how to properly implement triple buffering?

Through this thread, I see the STM32F line is mentioned a lot. From what I’ve read, those MCU’s have a slower Mhz then the Teensy 4. What features does the STM chips have, making them a better option, that the Teensy 4 doesn’t have? If there is anything. I’m unsure and this sounds like a reasonable question to help me understand.

@embig71, I was looking a the following 800*480 boards. The first one might not be SPI, but I can also work with 8-bit parallel if its an option and/or if it offers greater performance.

This is off the top of my head:

  • They have a graphics accelerator built-in, meaning that they can handle extra uses of blending, etc. at higher resolutions.
  • The Discovery boards also tend to ship with higher-speed display interfaces, but I don’t know if actual hardware designs end up utilizing those.
  • They’re extremely popular, which means that more people are familiar with them, thus they get more recommendations.

I’m planning to play with it soon, but not now. :frowning:
However, it’d be used well if the MCU has LCD/TFT driver periphery (to swap buffer with one command) which is not the case for Teensy 4.

I think it’s more important than having GPU. Sending the pixels to the display via SPI or parallel port takes a lot of time for larger screens.

Is the “higher-speed display interface” something similar to SPI but something most common MCU’s do not offer? Is there a specific name for this display port option so I know what to look for when exploring MCUs?

Thanks again for the support and information.

Another question for you all,

Does lvgl support that RA8875 chipset? I honestly know nothing about it other then is appears to be a dedicated display IC that supports SPI. Would using a chip like this offer any improvements compared to the other standard display options?

Probably the best option is a dedicated TFT-LCD periphery which directly sends the frames using HSYNC, VSYNC, and 16 or 24 pins for colors.

In case of STM it’s called LTDC. This PDF is specific to ST but the first few pages summarize the different options to drive a TFT.

Yes, but not all features. RA8875 has a built-in drawing engine that can’t be used directly by LVGL. However, you can use it like any other display controller: select rectangular area (called window) and copy the LVGL rendered image there.

Does this means that the power of the RA8875 is useless in lvgl’s case? Without that engine, does RA8875 offer any performance benefits compared to not using it?

Also, thanks for the PDF on STM’s LTDC tech. It was very informative. I’ll be focusing some time on learning more about the STM32F7x family.

The STM32F4 also has an LTDC controller, but does not have the Cortex-M7 core and caching abilities that the F7 does. Typically the F4 is not used for large resolutions like 800x480 as they require a lot of RAM for the framebuffer and the processor ends up being a bottleneck. However, I believe the F4 is cheaper than the F7.

I’m not sure the RA8875 will work well for your use case as it appears to interface through SPI, has its own memory buffer for the display (useful mainly when the host MCU already has RAM dedicated to an application), and its own drawing system, none of which LVGL can take advantage of at the moment. (Given that the RA8875 doesn’t seem to have any antialiasing or custom font support, this is unlikely to change, as you lose most of the control and flexibility LVGL gives you.)