ST7701 with LVGL 9 (Waveshare 2.8” 480x640 ESP32-S3)

Description

Any tips and tricks on speeding up the frame rate on a 2.8” 480x640 SPI screen using the ST7701 display driver?

What MCU/Processor/Board and compiler are you using?

I’m using the Waveshare ESP32-S3-Touch-LCD-2.8B

What LVGL version are you using?

LVGL 9

What do you want to achieve?

Faster frame rate and rotation.

What have you tried so far?

Waveshare has written their own ST7701 driver, along with their own IO Expander code but it was all written for LVGL 8. The code I had already written used LVGL 9 and I was using a cheaper 240x320 “Cheap Yellow Display” ESP32-WROOM. The biggest issue with that is I had no great way of mounting it into a guitar pedal enclosure with a nice and professional finished look.

So, I came across the Waveshare dev boards because they already have a nice cover glass when you use one of their capacitive touch panels. 2.8” is the perfect size to fit into a standard 1590B or 125B guitar pedal enclosure. The 1590B is a little bit narrower and smaller so we are shooting to make it work with one of those ideally.

My application is that I am building a (potentially open source) chromatic guitar enclosure. I need 3 GPIOs available:

  1. A GPIO that supports ADC1 because I do continuous reading of the incoming guitar signal. I’ve got a separate PCB that includes a pre-amplifier to make sure the signal from the guitar is nice and strong before it reaches the ESP32. For the Waveshare 2.8B device, they only expose one ADC and it’s GPIO_4 (ADC1_CH3).
  2. A GPIO for reading the pedal’s momentary foot switch. I’m currently using GPIO_0 which is also convenient for debugging w/o it all connected because I can use the BOOT button on the dev board.
  3. A GPIO for controlling the external low signal DPDT relay (a TQ2-L2-5V). I’m using GPIO_43 (RXD) and that all works. Waveshare has GPIO 43 and 44 incorrectly swapped for TXD and RXD. Took me a while to figure that one out.

The code supports a few different styles of UI for the tuning interface. None of them I have working very quickly on this new device. I’m guessing it’s because the resolution is so much higher than it was on the ESP32-WROOM device I previously had.

I’ve tried using the ST7701 driver that’s provided by Espressif but haven’t been able to figure that all out yet because of the expanded IO this dev board seems to use.

I think that my general plan is to try to use an approach similar to this: Someone willing to hold my hand? - #12 by kdschlosser

But…if anyone has any suggestions in general, I’m all ears. I feel like I’ve been banging my head against the wall for a few days to make this work.

The stuff that does show up on the screen, however, is extremely crisp and sharp though! I just hope I can speed it up a little bit so you don’t actually see the frames (a sweep from top to bottom of the screen when refreshes happen).

I currently have three different RTOS tasks running…

  1. A gpio_task on Core 0 that reads the state of the momentary foot switch and dispatches a press/double press/long press to the software
  2. GUI/LVGL task also running on Core 0
  3. Pitch Detection task solely running on Core 1 doing continuous reading of the ADC pin

My code is a mess right now because it’s all torn apart but I’ll try to clean it up today and get something posted here.

Screenshot and/or video

Here’s an example of how it’s currently working. I used to have 3 arcs spinning around the note but that was making the fps drop down to about 4fps. I also used to have some fade animation when the note stopped but that was also making the refresh rate slowness extremely apparent.

The code I have for this is running here:

The ST7701 driver is here: q-tune/main/hardware/ST7701S.c at boyd/switch-to-waveshare-ebd4 · joulupukki/q-tune · GitHub

My main GUI task is here that has the initialization:

@kdschlosser I wonder if you or others have any other recommendations I might try to improve the frame rate?

I’m currently using two buffers that are set up like this:

#define DISPLAY_BUFFER_ROWS             32
#define DISPLAY_BUFFER_SIZE             (480 * DISPLAY_BUFFER_ROWS)

static void *buf1 = NULL;
static void *buf2 = NULL;

void lvgl_port_flush_cb(lv_display_t *display, const lv_area_t *area, uint8_t *px_map) {
    esp_lcd_panel_draw_bitmap(lcd_panel, area->x1, area->y1, area->x2 + 1, area->y2 + 1, px_map);
    lv_display_flush_ready(display);
}


esp_err_t lvgl_init() {
    buf1 = heap_caps_malloc(DISPLAY_BUFFER_SIZE, MALLOC_CAP_INTERNAL | MALLOC_CAP_DMA);
    assert(buf1);
    buf2 = heap_caps_malloc(DISPLAY_BUFFER_SIZE, MALLOC_CAP_INTERNAL | MALLOC_CAP_DMA);
    assert(buf2);

    lv_display_set_buffers(lvgl_display, buf1, buf2, DISPLAY_BUFFER_SIZE, LV_DISPLAY_RENDER_MODE_PARTIAL);

    lv_display_set_flush_cb(lvgl_display, lvgl_port_flush_cb);
}

I still haven’t yet figured out rotation or the touch panel, but the fps has improved slightly with a few changes.

Here’s how it’s looking tonight:

https://youtube.com/live/enoow886azY

what is DISPLAY_BUFFER_SIZE??

I am not able to find the function LCD_Init

This is the proper way to allocate a DMA buffer
buf1 = heap_caps_malloc(DISPLAY_BUFFER_SIZE, MALLOC_CAP_INTERNAL | MALLOC_CAP_DMA);

Is LCD_DRAWBUF_SIZE * sizeof(uint16_t) the same size as DISPLAY_BUFFER_SIZE??

The reason why I ask this is because in the spi_bus_config_t structure the max_transfer_sz
field needs to be the exact same size as the frame buffer size. If it is smaller then multiple transactions are
going to be used to send the buffer data and that is going to slow things down.

Your callback function is wrong…

void lvgl_port_flush_cb(lv_display_t *display, const lv_area_t *area, uint8_t *px_map) {
    esp_lcd_panel_draw_bitmap(lcd_panel, area->x1, area->y1, area->x2 + 1, area->y2 + 1, px_map);

#if CONFIG_EXAMPLE_AVOID_TEAR_EFFECT_WITH_SEM
    xSemaphoreGive(sem_gui_ready);
    xSemaphoreTake(sem_vsync_end, portMAX_DELAY);
#endif
    lv_display_flush_ready(display);
}

It should be

void lvgl_port_flush_cb(lv_display_t *display, const lv_area_t *area, uint8_t *px_map) {
    esp_lcd_panel_draw_bitmap(lcd_panel, area->x1, area->y1, area->x2, area->y2, px_map);
}


void spi_trans_done_cb(esp_lcd_panel_io_handle_t panel_io, esp_lcd_panel_io_event_data_t *edata, void *user_ctx) {
    lv_display_flush_ready((lv_display_t *)user_ctx);
}


esp_lcd_panel_io_spi_config_t structure has 2 fields that need to be populated.
on_color_trans_done gets populated with the spi_trans_done_cb function
user_ctx gets populated with the lvgl display driver

on_color_trans_done = &spi_trans_done_cb
user_ctx = display

Doing the above will notify LVGL of the buffer transmission being completed when it actually completes.

OK so that is the DMA stuff that jumps right out at me that is incorrect…

Next up is you excessive use of tasks and locks. You don’t want to use locks as this is going to really slow things down. You need to register a hardware timer to handle calling
lv_tick_inc.

static void tick_timer_cb(void *arg)
{
    /* Tell LVGL how many milliseconds has elapsed */
    lv_tick_inc(1);
}


const esp_timer_create_args_t tick_timer_args = {
    .callback = &tick_timer_cb,
    .name = "tick_timer"
};
esp_timer_handle_t tick_timer = NULL;
ESP_ERROR_CHECK(esp_timer_create(&tick_timer_args, &tick_timer));
ESP_ERROR_CHECK(esp_timer_start_periodic(tick_timer, 1 + 1000));  // every 1 milliseconds

That will take care of calling the lv_tick_inc. The timer will run every 1 millisecond.

learn how to use freertos queues to pass data between tasks. This is an inportant thing to be
able to get rid of the locks. have the UI task check for messages and if there are any then carry
out the commands on the UI that you are instructing it to from other tasks. This will remove the need
for using locks. Locks have a pretty large performance hit especially if they are called often.

If the UI only updates from input from other tasks then you can have the UI thread get stalled waiting
for an incoming message from the queue. If the UI task needs to do other things like reading touch input or something
along those lines then simply let the UI task run in an infinate loop checking the queue and calling lv_task_handler. make sure to add a
task yield in there so other tasks running on the same core will be able to run.

1 Like

OH and last thing is to put all of the initialization code in one place. You have it spread out across multiple files and that makes it hard to follow what is happening.

The biggest recommendation i can make is to start off with a single source file. Create a UI task and initialize the LCD drivers and LVGL drivers in that task. Create the tick hardware timer and get that going. Then add a single widget that you can control. See if the refresh is better. Then go from there add in one piece at a time and see what the performance impacts are and optimize the code as you go. It’s a lot easier to manage that way.

1 Like

watching your video there is definatly something that is choking big time.

This is running using an ESP32 WROOM-1 with quad SPIRAM. The display is 320 x 480 x 16bit SPI. so a lot more pixels than the one you are using.

yours: 76,800 pixels
mine: 153,600 pixels

Now to top it off Python code is what is being executed and that will run a lot slower than C code. The LVGL code is in C but all the code that handles the tabs and the calculation of values of the different widgets is all done in Python. You are seeing whole display updates and not just a couple of small areas. So there is something that is a bottle neck in your UI.

IDK if you are familiar with Python or not but if you are and you want to give using it a try you can download the code form here. It’s easy to compile and it already has drivers for your display all set up to give the best possible performance. It’s easy to compile and fast to get up and running.

1 Like

Excellent. Thank you very much for the feedback. I will try to implement some of this tomorrow and hopefully make some good improvements.

When I used:

esp_lcd_panel_draw_bitmap(lcd_panel, area->x1, area->y1, area->x2, area->y2, px_map);

It would draw things odd on the screen, slanted. So I copied what I had seen in Waveshare’s demo app and it then started drawing things normally on the screen with:

esp_lcd_panel_draw_bitmap(lcd_panel, area->x1, area->y1, area->x2 + 1, area->y2 + 1, px_map);

I haven’t been able to figure out where or how to include the esp_lcd_panel_io_spi_config_t or how to initialize the lcd panel with this ST7701. I tried using the ST7701 from the ESP Component Registry but got confused every time when I tried to hook it up because I wasn’t sure how to use expanded IO that Waveshare shows in their demo code (which is, unfortunately only for LVGL 8).

I definitely don’t understand the tick_timer_cb stuff and/or the locks. Early on when I was tinkering with this months ago I had all sorts of crashes when I didn’t use locking so basically anywhere I do anything with LVGL I have been locking first, doing the changes, and then unlocking. I’ll try to brush up on my chops there. Thanks for that hint.

I thought I read somewhere that lv_task_handler() isn’t supposed to be used and that lv_timer_handler() should be used instead. But I could be remembering wrong.

I’ll try to get my code cleaned up. It’s been just a lot of code slinging just to see if I could get it all to function at all. As I’ve understood some things I’ve been trying to clean up as I go, but yeah, it needs a LOT more work.

Thanks!

My display is 480 x 640 @ 16bit: 307,200 pixels, if I’m looking at this right (4 times the pixel count of the lower-res screen I was using before this Waveshare at 240x320 @16 bit).

I was going off of what was in the source code and not what you had typed in your first post. I realized the boo boo with the resolution later on…

So here is the skinny as far as the frame buffer size is concerned…

width * height * bytes per pixel / 10

That is what you should be setting your buffer sizes to. That being said you are not going to be able to fit 2 frame buffers of that size into internal ram. You are going to need to use SPIRAM instead. You will need to make that change as well. The max_transfer_sz field also need to be set as the same size as the frame buffers.

as I had said. start off small with only a single widget and only the one task for LVGL. See how the refresh rate is with that.

I was initially using SPIRAM and it was much slower than reducing the size of the buffers and using DMA. Also, when I tried width * height * bytes per pixel / 10 with the SPIRAM, the screen was very glitchy. It’s display stuff but things would bounce around.

In terms of using a single task for LVGL do I set up one RTOS task running on 1 single core for just the tick stuff and then another RTOS task running on the same core for creating and manipulating the different LVGL objects?

Would this do the same thing for handling the lv_tick_inc()? Or does the lv_tick_inc() need to be in the same RTOS task that all the LVGL code is running in?

void lv_tick_task(void *arg) {
    while (1) {
        lv_tick_inc(1);  // Increment LVGL's tick counter by 1 ms
        vTaskDelay(pdMS_TO_TICKS(1)); // Wait 1 ms
    }
}

void start_lv_tick() {
    xTaskCreate(lv_tick_task, "lv_tick_task", 2048, NULL, 3, NULL);
}
``

You must call the tick from a function where you are absolutely certain it is going to run every x ms.
If your device has timers on it, I recommend using a timer interrupt.

1 Like

EXACTLY!!!

@joulupukki
I provided you with code for handling the tick. That is what should be used.

then create a single FreeRTOS task running on core 1. before your while loop have all of you initialization code for the display drivers and LVGL drivers. If you need me to write you up a rough draft I can do that for you…

1 Like

I’m a bit confused on how to use esp_lcd_panel_io_spi_config_t. The Waveshare demo didn’t use that and I’m not sure how to get that to fit in.

I’ve pulled out esp_lvgl_port and all of my lvgl_port_lock() calls are basically just returning true (so I wouldn’t have to change all the code everywhere). That’s all out of there.

I’ve got the tick timer working as well.

give me the pin definitions for your display. and I will hammer out a quick example for ya that you can test out. Does your display have a touch screen? if it does I will need the pin definitions for that as well.

HOLD up. That display is an RGB display. it is NOT an SPI display.

That is a whole different animal to deal with. and it has it’s own set of really complex things that have to be overcome.

OH!? Dang. :expressionless:

Everything I know about it comes from Waveshare’s ESP-IDF demo, but they are using LVGL 8 and not LVGL 9. I’d prefer sticking to LVGL 9.

Their demo is available here: https://files.waveshare.com/wiki/ESP32-S3-Touch-LCD-2.8B/ESP32-S3-Touch-LCD-2.8B-Demo.zip

All the pin info is in ESP-IDF/ESP32-S3-LCD-2.8B-Test/main/LCD_Driver/ST7701S.c/h

I have their other 2.8" screen here and it uses a more common ST7789 display driver and the lower resolution. Maybe I should just switch to that and be done with this ST7701. From lots of other things I’ve read, RGB at that resolution isn’t really gonna be that fast on ESP32.

Because it is an RGB interface to the display is the reason why you are having a performance issue. It will require quite a bit of code in order to boost the performance.

I am going to make this suggestion again because I have managed to squeeze just about every drop of performance that is to be had out of the RGB interface displays. Try using that code I linked to earlier and see how it performs. You will be surprised at how well it does… It is also super simple to add you own code to without needing to compile and flash the firmware over and over again.