DMA Framebuffer

I conducted experiments on a customized ESP32 S3-powered board with 16 MB of flash, running at a clock speed of 160 MHz. Through my exploration, I observed that employing two full-screen frame buffers consistently resulted in a frame rate of approximately 4 or 5 FPS. However, I discovered that by using a framebuffer with a size 0.99 times that of the screen, this issue could be mitigated, even when updating the entire screen. This behavior may be linked to the interpretation of the LVGL Display interface.

Various strategies exist for LVGL buffers:

  1. Single buffer: LVGL draws screen content into a buffer, which may be smaller than the screen. In this scenario, larger areas are redrawn in segments, and only the altered regions are refreshed during updates (e.g., button presses).
  2. Two non-screen-sized buffers: With two buffers, LVGL can draw into one while the other’s content is sent to the display in the background. Hardware mechanisms like DMA facilitate efficient data transfer, enabling the CPU to draw simultaneously. Similar to the single buffer approach, LVGL redraws display content in chunks if the buffer is smaller than the refresh area.
  3. Two screen-sized buffers: In contrast to non-screen-sized buffers, LVGL provides the entire screen’s content instead of chunks. This approach is most effective when the MCU has an LCD/TFT interface, and the frame buffer is a location in the RAM. The driver can easily change the frame buffer’s address to the one received from LVGL.

In summary, my methodology involves exploiting the feature that draws only the altered areas while still utilizing two screen-sized buffers.Using that method guaranteed around 33 FPS with 50% CPU usage This suggests that LVGL could benefit from an additional option:

  1. Two screen-sized buffers with region refresh: LVGL can draw into one buffer while the other’s content is sent to the display in the background. Hardware mechanisms, such as DMA, enable efficient data transfer, allowing the CPU to draw simultaneously. Similar to the single buffer approach, LVGL redraws display content in chunks, refreshing only the altered areas.

And there is my code approach

// initialize DMA draw buffers with almost the screen size 
buf1 = heap_caps_aligned_alloc(64, (LV_HOR_RES_MAX * LV_VER_RES_MAX)*0.99 * sizeof(lv_color_t), MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);
buf2 = heap_caps_aligned_alloc(64, (LV_HOR_RES_MAX * LV_VER_RES_MAX)*0.99 * sizeof(lv_color_t), MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT);

// initialize LVGL draw buffers
lv_disp_draw_buf_init(&disp_buf, buf1, buf2, (LV_HOR_RES_MAX * LV_VER_RES_MAX)*0.99 );

use float here is little ugly technique

what would be a better approach? @Marian_M

(LV_HOR_RES_MAX * LV_VER_RES_MAX)*0.99 * sizeof(lv_color_t)

This doesn’t make any sense. You are still basically redrawing the entire screen. You should be using something along these lines

(LV_HOR_RES_MAX * LV_VER_RES_MAX) * 0.10 * sizeof(lv_color_t)

This is also another problem


You are not using DMA memory with the allocation like that.


Is what you want to use. and if you allocate the frame buffers first thing when the program starts then you want to use


This will use internal RAM instead of SPI RAM. SPI RAM is incredibly slow in comparison to the internal memory

Using double buffering without using DMA memory is pointless to do, you get no improvement in speed at all.

I also suspect that you might not be using double buffering properly. and if you tell me what kind of data bus you are using with the display, I8080 or SPI I can give you an example on how to do it to get the best performance.

The best performance is going to be with the frame buffers around 10% of the size of the display. Using 2 buffers that are allocated using the internal DMA memory.

This is kind of ugly.

(LV_HOR_RES_MAX * LV_VER_RES_MAX) * 0.10 * sizeof(lv_color_t)

it should be

LV_HOR_RES_MAX * LV_VER_RES_MAX / 10 * sizeof(lv_color_t)

Thank you, @kdschlosser, for your clarification. I appreciate your insight, and I acknowledge my mistake. Upon further examination, I realized that I wasn’t employing DMA, even though my solution yielded significantly improved results compared to utilizing exact full-sized frame buffers without any loss of quality. In terms of my display setup, I’m using a parallel display with 565 RGB.

I would be grateful if you could provide an example as you mentioned, guiding me on how to achieve optimal performance for my display.

I can do that for ya. Give me a bit of time to get it together for ya. I have a bunch of things I have to do today so it will not be until later today or this evening before I will be able to get to it.

Sure, I’ll be awaiting your response. Additionally, I’d like to highlight that I need the display to incorporate a double full-screen buffer. This is crucial because a specific section of the application involves dynamically altering the entire screen’s colors, such as displaying a warning.

If you could supply me with the code you are currently using and I can put something together that would be as close to running as possible. I personally don’t like using tft_espi but if that is what you are using that I will set things up to work with that. If you are using the Arduino IDE to handle the compiling I have not messed with that too much and I know there are some differences then doing a direct compile using the ESP-IDF. I will have to hammer out what those differences are. Not a big deal to do just need to know how you are currently compiling.

double buffer is not an issue. and setting it to use internal DMA memory is what the goal is. Need to also know what the display resolution and color depth is.

You are using the I8080 bus correct?

I have not tested this code so there will be bugs in it. It is to give you a general idea. This is made for the ST7796 display. I did not include anything for touch drivers.

#include <stdio.h>
#include <stdlib. h>
#include "esp_lcd_panel_io.h"
#include "esp_lcd_panel_ops.h"
#include "esp_lcd_panel_commands.h"
#include "driver/gpio.h"
#include "esp_err.h"
#include "lvgl.h"

#define DC 8 //  dc pin number
#define CS 9 // cs pin number if used, if not used then -1
#define WR 10 // wr pin number
#define PIXEL_CLOCK_HZ (10 * 1000 * 1000) // clock speed of the display

// supported valued are 8, 16, 32 and 64.
// higher value means it is doing to have a faster transmit
// the sit is if the size of the frame buffer is not equally divisible
// by this number you will end up using additional memory.
// there is always a price for speed

#define BACK_LIGHT -1 // backlight pin set to -1 if not used
#define BACKLIGHT_LEVEL // 1 = high = on, 0 = low = on

#define RESET -1 // reset pin set to -1 if not used
#define RESET_LEVEL 1 // 1 = high = reset, 0 = low = reset

// you may have to change these depeding on the display
#define CMD_BITS 8 // how many bits a command is
#define PARAM_BITS 8 // how many bits a parameter is

// data pins
#define DATA0 11
#define DATA1 12
#define DATA2 13
#define DATA3 14
#define DATA4 15
#define DATA5 16
#define DATA6 17
#define DATA7 18

// set to your display width and height
#define DISPLAY_WIDTH 480
#define DISPLAY_HEIGHT 320

// 65556 colors RGB565 (16 bit)
#define PIXEL_FORMAT 0x55

// do not change
#define PORTRAIT -1
#define LANDSCAPE -2

// change this for your display orientation (seen above)

// do not change
#define COLOR_MODE_RGB 0x00
#define COLOR_MODE_BGR 0x08

//change this to the color mode (seen above)

// do not change anything below this line
#define BUFFER_SIZE (DISPLAY_HEIGHT * DISPLAY_WIDTH / 10 * sizeof(lv_color_t))

#define MADCTL_MH 0x04  // Refresh 0=Left to Right, 1=Right to Left
#define MADCTL_ML 0x10  // Refresh 0=Top to Bottom, 1=Bottom to Top
#define MADCTL_MV 0x20  // 0=Normal, 1=Row/column exchange
#define MADCTL_MX 0x40  // 0=Left to Right, 1=Right to Left
#define MADCTL_MY 0x80  // 0=Top to Bottom, 1=Bottom to Top

#define LCD_CMD_DFC 0xB6
#define LCD_CMD_DOCA 0xE8
#define LCD_CMD_PWR2 0xC1
#define LCD_CMD_PWR3 0xC2
#define LCD_CMD_VCMPCTL 0xC5
#define LCD_CMD_PGC 0xE0
#define LCD_CMD_NGC 0xE1
#define LCD_CMD_INVTR 0xB4
#define LCD_CMD_CSCON 0xF0

static bool notify_lvgl_flush_ready(esp_lcd_panel_io_handle_t panel_io, esp_lcd_panel_io_event_data_t *edata, void *user_ctx)
    lv_disp_drv_t *disp_driver = (lv_disp_drv_t *)user_ctx;
    return false;

static void lvgl_flush_cb(lv_disp_drv_t *drv, const lv_area_t *area, lv_color_t *color_map)
    esp_lcd_panel_io_handle_t panel_io_handle = (esp_lcd_panel_io_handle_t) drv->user_data;

        (uint8_t[]) {(area->x1 >> 8) & 0xFF, area->x1 & 0xFF, ((area->x2 - 1) >> 8) & 0xFF, (area->x2 - 1) & 0xFF},

        (uint8_t[]) {(area->y1 >> 8) & 0xFF, area->y1 & 0xFF, ((area->y2 - 1) >> 8) & 0xFF, (area->y2 - 1) & 0xFF},

    esp_lcd_panel_io_tx_color(panel_io_handle, LCD_CMD_RAMWR, (uint8_t *)color_map, (area->x2 - area->x1) * (area->y2 - area->y1));

void init_i80_bus(esp_lcd_panel_io_handle_t *io_handle, void *user_ctx)
    esp_lcd_i80_bus_handle_t i80_bus = NULL;

    esp_lcd_i80_bus_config_t bus_config = {
        .clk_src = LCD_CLK_SRC_PLL160M,
        .dc_gpio_num = DC,
        .wr_gpio_num = WR,
        .data_gpio_nums = {
        .bus_width = 8,
        .max_transfer_bytes = BUFFER_SIZE,
        .psram_trans_align = 4,
        .sram_trans_align = SRAM_DATA_ALIGNMENT,

    ESP_ERROR_CHECK(esp_lcd_new_i80_bus(&bus_config, &i80_bus));

    esp_lcd_panel_io_i80_config_t io_config = {
        .cs_gpio_num = CS,
        .pclk_hz = PIXEL_CLOCK_HZ,
        .trans_queue_depth = 10,
        .dc_levels = {
            .dc_idle_level = 0,
            .dc_cmd_level = 0,
            .dc_dummy_level = 0,
            .dc_data_level = 1,
        .flags = {
            .swap_color_bytes = 0,
        .on_color_trans_done = notify_lvgl_flush_ready,
        .user_ctx = user_ctx,
        .lcd_cmd_bits = CMD_BITS,
        .lcd_param_bits = PARAM_BITS,

    ESP_ERROR_CHECK(esp_lcd_new_panel_io_i80(i80_bus, &io_config, io_handle));

void init_lcd_panel(esp_lcd_panel_io_handle_t panel_io_handle)
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_SWRESET, NULL, 0)
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_SLPOUT, NULL, 0)
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_CSCON, (uint8_t []){0xC3}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_CSCON, (uint8_t []){0x96}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_MADCTL, (uint8_t []){MADCTL_VAL}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_COLMOD, (uint8_t []){PIXEL_FORMAT}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_INVTR, (uint8_t []){0x20}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_DFC, (uint8_t []){0x80, 0x02, 0x3B}, 3);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_DOCA, (uint8_t []){0x40, 0x8A, 0x00, 0x00, 0x29, 0x19, 0xA5, 0x33}, 8);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_PWR2, (uint8_t []){0x06}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_PWR3, (uint8_t []){0xA7}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_VCMPCTL, (uint8_t []){0x18}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_PGC, (uint8_t []){0xF0, 0x09, 0x0b, 0x06, 0x04, 0x15, 0x2F, 0x54, 0x42, 0x3C, 0x17, 0x14, 0x18, 0x1B}, 14);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_NGC, (uint8_t []){0xE0, 0x09, 0x0B, 0x06, 0x04, 0x03, 0x2B, 0x43, 0x42, 0x3B, 0x16, 0x14, 0x17, 0x1B}, 14);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_CSCON, (uint8_t []){0x3C}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_CSCON, (uint8_t []){0x69}, 1);
    esp_lcd_panel_io_tx_param(panel_io_handle, LCD_CMD_DISPON, NULL, 0);

void aetup(void)
    // alloc draw buffers used by LVGL
    // it's recommended to choose the size of the draw buffer(s) to be at least 1/10 screen sized
    lv_color_t *buf1 = (lv_color_t *)heap_caps_malloc(BUFFER_SIZE, MALLOC_CAP_DMA | MALLOC_CAP_INTERNAL);
    lv_color_t *buf2 = (lv_color_t *)heap_caps_malloc(BUFFER_SIZE, MALLOC_CAP_DMA | MALLOC_CAP_INTERNAL);


    static lv_disp_draw_buf_t disp_buf; // contains internal graphic buffer(s) called draw buffer(s)
    static lv_disp_drv_t disp_drv;      // contains callback functions

#if BACK_LIGHT > -1
    gpio_config_t bk_gpio_config = {
        .mode = GPIO_MODE_OUTPUT,
        .pin_bit_mask = 1ULL << BACK_LIGHT

    gpio_set_level(BACK_LIGHT, !BACKLIGHT_LEVEL);
#endif // BACK_LIGHT > -1

#if RESET > -1
    gpio_config_t reset_gpio_config = {
        .mode = GPIO_MODE_OUTPUT,
        .pin_bit_mask = 1ULL << RESET

    gpio_set_level(RESET, RESET_LEVEL);
    gpio_set_level(RESET, !RESET_LEVEL);
#endif // RESET > -1

    esp_lcd_panel_io_handle_t io_handle = NULL;
    init_i80_bus(&io_handle, &disp_drv);

#if BACK_LIGHT > -1
    gpio_set_level(BACK_LIGHT, BACKLIGHT_LEVEL);
#endif // BACK_LIGHT > -1

    // initialize LVGL draw buffers
    lv_disp_draw_buf_init(&disp_buf, buf1, buf2, BUFFER_SIZE / sizeof(lv_color_t));

    disp_drv.hor_res = DISPLAY_WIDTH;
    disp_drv.ver_res = DISPLAY_HEIGHT;
    disp_drv.flush_cb = lvgl_flush_cb;
    disp_drv.draw_buf = &disp_buf;
    disp_drv.user_data = io_handle;
    lv_disp_t *disp = lv_disp_drv_register(&disp_drv);

    //add additional code here for setup

void loop(void)

    // this pay or may not be needed depending on how you have things set up

    // this MUST be here

I am using ESP-IDF as well

I started my display driver code from this example

Interesting, i will apply most of those changes and run a performance test but overall the code looks good

How did you make out? any performance improvements?

After implementing full-screen frame buffers with DMA, I observed unconsistent performance in a stable frame rate of 5 fps. However, when employing the concept of utilizing 99% of the screen buffers, I noticed that the screen updates seamlessly without visible transitions between buffers. This results in a stable frame rate of 33 fps.

1 Like