Best MCU for LVGL? Would the new Teensy 4.1 be enough?

Hi folks,

As I look into exactly the same type of application:

For an spi interface, you must use it to the max freq and it means hardware spi with dma.

There are spi screens 5 or 7 inches with spi. Surenoo store has lcd “modules” with spi. 800x480. Crystalfonz have their eve 5 inch sunreadable.

I will actually use the 5 inch surenoo with ESP 32 . I need 1hz frame rate …

For the teensy 4.1 you might be able using flexio block to build an alternate 16 bit mcu interface. That means the real frame buffer has to remain in the screen. It means that the lcd module you will buy needs to have an internal ram . Rgb 16/24 bits lcd with pclk cannot be used unless you build with flexio a real lcd pixel interface. There is an app note on nxp website about using flexio like that for kinetis cpus ( cpu is différent but flexio block is the same)

Hope it helps a bit.

1 Like

I’m testing 5" displays with 800x480 using a LT768 video processor/accellerator on an ESP32. It seem to be fast for playing video, has also panning, scrolling PIP and other HW accelleration.
It is this one: https://www.aliexpress.com/item/4000282154239.html?spm=a2g0o.productlist.0.0.2d745798XCI0r1&algo_pvid=46993936-b698-4ef2-9524-f50deb133ce8&algo_expid=46993936-b698-4ef2-9524-f50deb133ce8-0&btsid=0be3769015916778431468077e2d09&ws_ab_test=searchweb0_0,searchweb201602_,searchweb201603_
But it’s quite new and I don’t think there is yet a LVGL support for it.
Before, I have been working with RA8875 5" 800x480 displays.
The LT768 display can be used in parallel or SPI mode und there a resitive and capacitive touch versions available.

It seems very slow from the video on aliexpress. Have I missed something?

I’m using LVGL with the i.MX RT 1052 boards and I’m pleased with the performance.
This is for a 480x272 with touch controller. It has the advantage that the CPU is able to transmit the data directly to the display which is fast compared to use an SPI connection. SPI works well for me on smaller displays too.
If selecting a display controller type: I like the SSD (Solomon Systech) ones which (at least some of them) have a ‘dual RAM’: you can write to the second RAM of the display and then switch the content: that way the update is ‘immediate’. You still will need time to write the data, but at least it is less visible to the human eye.

On a side note: if using the Teensy, keep in mind you won’t be able to debug it (at least the ones I have used).

I hope this helps.

1 Like

Hello,

I am using the evkbimxrt1050 board, 480x272 resolution panel, 16-bit pixel. I tried the lvgl7 benchmark, the result is:

Weighted FPS: 68
Opa. speed: 92%

The panel is a dumb panel driven by MCU’s LCDIF. If smart panel is used, I guess it will be slower because of SPI transfer speed.

It’s strange that, even with only I-Cache enabled, the weaker STM32F746NG still gets double the frame rate that you are getting, with a very similar display setup (480x272, 16bpp, LTDC interface). Do you have cache enabled? Does the iMX.RT have any graphics accelerator to take advantage of?

Hello, thanks for point this out, IMXRT porting does not use the hardware acceleration feature.

After checking the STM32F746 lvgl display port layer, it uses different methods with imxrt1050. In short, seems STM23F746 does not use VSYNC for frame synchronization, the LVGL render buffer(buf1_1 and buf1_2) content is copied directly to LTDC display buffer(my_fb) using DMA, even when LTDC not in VBLANK period. Maybe there is tearing when the demo runs. For IMXRT, there are two frame buffers and VSYNC is used to remove tearing effect.

I’ve also tried RT1050 without VSYNC and saw weighted FPS: 176.

The driver was very primitive just memcpys LVGL’s buffer to the frame buffer With DMA it’d be even faster.

If you put both frame buffers to external RAM and make LVGL to draw directly into them you lose some performance because of the lot of access to the slower SRAM.

Double buffered VSYNC also makes rendering slow because it takes some time while you are waiting for VSYNC.

The ultimate solution would triple buffering with 2 LVGL buffers, but there is no official example for it yet. :slightly_frowning_face:

Hello, is there any tearing if ignore VSYNC?
I’m testing V7.02 on RT1052EVKB with a 480*272 16bpp panel, would you mind telling me how to improve the framerate?
I tested benchmark, and some test cases can be as low as 15~17 FPS

The most simple think you can do:

  • enable -O3 optimization
  • be sure MCU cache is enabled.
  • place LVGL’s display buffer into internal RAM and copy them into the frame buffer in flush_cb
  • use 2 display buffers for LVGL and use DMA to copy them into the frame buffer.

Thanks for your feedback.

  1. I’ll test with -O3. Currently I’m using IAR 8.50.1, medium optimization. I’ll try higher.
  2. Cache is enabled.
  3. i.MX RT1052 only have 512KB internal RAM, if I want to put buffer to internal RAM. It won’t be full size buffer. Frame buffer will be in external DRAM, I’ll test copy in flush_cb without waiting for VSYNC, see the improvement on framerate and if any tearing or flicker issue.

Regards.

On a Teensy 4.0 with a 3.5" 480x320 display got the following results:

Weighted FPS:75
Opa speed:112%

That is using the SPI interface.

1 Like

@skpang, thanks for sharing this. If you don’t mind can you tell me what display you’re using?

  • What is the IC in the display (ILI9488, HX8357D, etc.)?
  • What TFT library are you using?
  • Are you using any form of DMA? Is it DMA in lvgl or in the tft library?
  • If you run the lv_demo_widgets example, how does the screen scrolling feel in the demo’s “Visuals” tab?

Your display looks identical to the ILI9488 I’m using:

Thanks!

Yes, I’m using the ER-TFTM035-6 fro buydisplay. I’ve chosen the capacitive touch version.

I used the ILI9488_t3.h SPI library and changed the SPI clock to 60MHz.

The code for Teensy 4.0 is on my GitHub:

There is the demo_create(); on Youtube:

2 Likes

Thanks. That looks to be about the same scroll quality as I currently have using an HX8357D (480x320) on an STM32767zi with an 8-bit Parallel connection.

But I will go back to my ER-TFTM035-6 SPI on a Teensy 4.1 now that I have more experience with this library and report back a comparison.

Hello All, I might want to know is there an approach to utilize sequential sort of memory with PSoC5 MCU’s EMIF part. In one of my prior presents I oversaw on run LVGL GUI on PSoC5 stage, so now I might want to utilize outer RAM for show cradle as it is upheld by LVGL. By perusing a datasheet for EMIF I reasoned that it’s proposed to be utilized with recollections that utilization an equal interface. In the event that that is the situation, is there an approach to utilize sequential RAM with PSoC5 as a designs cradle or is that lone conceivable on PSoC6 arrangement of MCU’s with SMIF segment?

@skpang do you mind trying the following changes in your demo code? It enables the frame buffer + clipping rectangle for updating the screen a little faster. I wonder if it will improve the benchmark

void setup(){
    display.begin();
    display.fillScreen(ILI9488_BLUE);
    display.setRotation(1);
    display.useFrameBuffer(true); // Enable DMA framebuffer
}


/* Display flushing */
void my_disp_flush(lv_disp_drv_t *disp, const lv_area_t *area, lv_color_t *color_p)
{ 

  uint16_t width = (area->x2 - area->x1 +1);
  uint16_t height = (area->y2 - area->y1+1);
 
  display.writeRect(area->x1, area->y1, width, height, (uint16_t *)color_p);
  display.setClipRect(area->x1, area->y1, width, height); // Set the clipping rectangle of the area that needs updating
  display.updateScreen(); // Update the display
  display.setClipRect(); // Clear the clipping rectangle

  lv_disp_flush_ready(disp); /* tell lvgl that flushing is done */
  
}

Once I get my display adaptor in, I’ll try this demo out on the Teensy 4.1 with the external PSRAM. I have the same display in a bare module mode with direct FPC.
The ILI9488_t3 library supports usage of the external RAM that might improve performance even more!

Looks like a bit slower with your settings:

Thanks for the quick test!
Try comment out the following from void my_disp_flush

display.setClipRect(area->x1, area->y1, width, height); // Set the clipping rectangle of the area that needs updating
display.updateScreen(); // Update the display
display.setClipRect(); // Clear the clipping rectangle

This will use only the frame buffer with DMA which theoretically should be faster

Hi @LissieMayo
This topic is for Teensy. Please open a new topic for PSoc.