Image drawing performance - in-flash lv_img_dsc_t vs. loading/decoding PNG

bobwolff68 · October 26, 2024, 7:41pm

Description

Slow painting performance using decoded PNGs vs pre-converted in-flash-image-data

What MCU/Processor/Board and compiler are you using?

ESP32

What LVGL version are you using?

8.3

What do you want to achieve?

On-par redraw performance when using the decoded PNG results from lv_img_set_src to that of using lv_img_set_src pointing to an lv_img_dsc_t produced from the online image converter tool.

What have you tried so far?

I’ve validated that I can get about 10fps of display drawing performance from my ESP32 + 320x240 ILI9341 display. However, I’m only able to get this performance if I use the lv_img_dsc_t which represents pre-processed image data using the LVGL image converter tool online. If, instead, I use the PNG decoder to decode the same images and display them on screen, the refresh of the screen takes about 5 seconds per frame. Ouch.

The test jig at the moment has a total of (4) 128x128 pixel images which get placed in a tiled pattern after loading. Then the loop() moves the position of those images by 5 pixels right/down or left/up to simply give “motion”. This works quite well on the pre-processed image data but is very poor performance in the case of the PNG decoded image.

Recently I did recognize that the PNG files I was loading for decode actually had an Alpha bit in it and I thought this might be the source of the performance issue. So, I re-compressed the PNG with alpha off. It didn’t help the performance, to my surprise. I also tried encoding the PNG using true color vs RGB888 and also RGB444 and none of these variants helped.

Somehow I believe the result of the decoded image is possibly in a format which causes LVGL to have to do a LOT more processing of a redraw. I believe this because when I take the image and run it through the online image converter and utilize that resulting .h/.c code for the image source, the drawing speed is just fine at 10fps or so.

FWIW when I do the lv_img_set_src and point to the compressed PNG files, in my test jig I only load those files once (lv_img_set_src()) and so the decoding is completed only once. The rest is just the lvgl redrawing differences between the two methods.

Code to reproduce

No issue sharing code, but it’s a bit difficult to show the salient and usable bits. I’ll put two code blocks in here to show what I think are the most important parts.

Here is the common code for lv_task_handler(), the display setup, and the displayFlush() operations.

void displayFlush( lv_disp_drv_t *disp, const lv_area_t *area, lv_color_t *color_p )
{
    uint32_t w = ( area->x2 - area->x1 + 1 );
    uint32_t h = ( area->y2 - area->y1 + 1 );

    tft.startWrite();
    tft.setAddrWindow( area->x1, area->y1, w, h );
    tft.pushColors( ( uint16_t * )&color_p->full, w * h, true );
    tft.endWrite();
    lv_disp_flush_ready( disp );
}

#define BUFFER_DIVIDER_FACTOR 16

void mySetup()
{
  tft.begin();

  static lv_disp_draw_buf_t display_buffer;
  static lv_color_t image_buffer[SDL_HOR_RES  * SDL_VER_RES / BUFFER_DIVIDER_FACTOR];
  static lv_color_t image_buffer2[SDL_HOR_RES  * SDL_VER_RES / BUFFER_DIVIDER_FACTOR];
  lv_disp_draw_buf_init(&display_buffer, image_buffer, image_buffer2, SDL_HOR_RES  * SDL_VER_RES / BUFFER_DIVIDER_FACTOR);

  static lv_disp_drv_t disp_drv;
  lv_disp_drv_init(&disp_drv);  
  disp_drv.hor_res = SDL_HOR_RES;
  disp_drv.ver_res = SDL_VER_RES;
  disp_drv.flush_cb = displayFlush;
  disp_drv.draw_buf = &display_buffer;
  lv_disp_t *disp = lv_disp_drv_register(&disp_drv);

  static lv_indev_drv_t indev_drv;           /*Descriptor of a input device driver*/
  lv_indev_drv_init(&indev_drv);             /*Basic initialization*/
  indev_drv.type = LV_INDEV_TYPE_POINTER;    /*Touch pad is a pointer-like device*/
  indev_drv.read_cb = touchscreen_read;      /*Set your driver function*/
  lv_indev_drv_register(&indev_drv);         /*Finally register the driver*/

}

/*
  Note: My lv_conf.h is doing lv_tick_inc by using millis() to provide that timing.
*/
void loop() {
    if (plvimage)
      plvimage->moveTiles();
    lv_task_handler();
}

Here is the test jig using pre-converted image data

lv_img_dsc_t brc_z0_x0_y0 = {
  .data_size = 16384 * LV_COLOR_SIZE / 8,
  .data = brc_z0_x0_y0_map,
};
lv_img_dsc_t brc_z0_x1_y0 = {
  .data_size = 16384 * LV_COLOR_SIZE / 8,
  .data = brc_z0_x1_y0_map,
};
lv_img_dsc_t brc_z0_x0_y1 = {
  .data_size = 16384 * LV_COLOR_SIZE / 8,
  .data = brc_z0_x0_y1_map,
};
lv_img_dsc_t brc_z0_x1_y1 = {
  .data_size = 16384 * LV_COLOR_SIZE / 8,
  .data = brc_z0_x1_y1_map,
};

struct memimg {
    lv_img_dsc_t * dsc;
    int16_t x, y;
    lv_obj_t *img;
};

class LVImageTest {
public: 
    LVImageTest() { 
        brc_z0_x0_y0.header.cf = LV_IMG_CF_TRUE_COLOR,
        brc_z0_x0_y0.header.always_zero = 0,
        brc_z0_x0_y0.header.reserved = 0,
        brc_z0_x0_y0.header.w = 128,
        brc_z0_x0_y0.header.h = 128,
        brc_z0_x1_y0.header.cf = LV_IMG_CF_TRUE_COLOR,
        brc_z0_x1_y0.header.always_zero = 0,
        brc_z0_x1_y0.header.reserved = 0,
        brc_z0_x1_y0.header.w = 128,
        brc_z0_x1_y0.header.h = 128,
        brc_z0_x0_y1.header.cf = LV_IMG_CF_TRUE_COLOR,
        brc_z0_x0_y1.header.always_zero = 0,
        brc_z0_x0_y1.header.reserved = 0,
        brc_z0_x0_y1.header.w = 128,
        brc_z0_x0_y1.header.h = 128,
        brc_z0_x1_y1.header.cf = LV_IMG_CF_TRUE_COLOR,
        brc_z0_x1_y1.header.always_zero = 0,
        brc_z0_x1_y1.header.reserved = 0,
        brc_z0_x1_y1.header.w = 128,
        brc_z0_x1_y1.header.h = 128,
        images[0][0] = { .dsc=&brc_z0_x0_y0, .x=0, .y=0, .img=nullptr };
        images[1][0] = { .dsc=&brc_z0_x1_y0, .x=128, .y=0, .img=nullptr };
        images[0][1] = { .dsc=&brc_z0_x0_y1, .x=0, .y=128, .img=nullptr };
        images[1][1] = { .dsc=&brc_z0_x1_y1, .x=128, .y=128, .img=nullptr };
        load(); };
    ~LVImageTest() {};
    void moveTiles() {
        static int adv = 5;
            if (images[0][0].x > 128 || images[0][0].x < 0) {
                adv *= -1;
            }
            for (int x=0; x<2; x++) {
                for (int y=0; y<2; y++) {
                    images[y][x].x += adv;
                    images[y][x].y += adv;
                    lv_obj_set_pos(images[y][x].img, images[y][x].x, images[y][x].y);
                }
            }

    };
    void load() {
        for (int x=0; x<2; x++) {
            for (int y=0; y<2; y++) {
               images[y][x].img = lv_img_create(lv_scr_act());
                if (!images[y][x].img) {
                    printf("ERROR: Unable to load tile: %d,%d\n", x, y);
                }
                lv_img_set_src(images[y][x].img, images[y][x].dsc);
                lv_obj_set_pos(images[y][x].img, images[y][x].x, images[y][x].y);
            }
        }
    };
protected:
    struct memimg images[2][2];
};

The next code block is essentially the same kind of image test class only this one utilizes filenames for the PNG files that are in a LittleFS partition.

lv_obj_t *load_and_place_tile(const char* full_path, lv_coord_t x, lv_coord_t y) {
    if (!full_path)
        return NULL;
    // Create an image object
    lv_obj_t* img_tile = lv_img_create(lv_scr_act());  // Create on the active screen

    if (!img_tile)
        return NULL;

    // Set the source of the image (full path to the image file)
    lv_img_set_src(img_tile, full_path);
    lv_obj_set_pos(img_tile, x, y);

    return img_tile;
}

struct img {
    const char *fn;
    int16_t x, y;
    lv_obj_t *img;
};

class ImageTest {
public: 
    ImageTest() { 
        images[0][0] = { .fn="S:brc_z0_x0_y0.png", .x=0, .y=0, .img=nullptr };
        images[1][0] = { .fn="S:brc_z0_x1_y0.png", .x=128, .y=0, .img=nullptr };
        images[0][1] = { .fn="S:brc_z0_x0_y1.png", .x=0, .y=128, .img=nullptr };
        images[1][1] = { .fn="S:brc_z0_x1_y1.png", .x=128, .y=128, .img=nullptr };
        load();
      };
    ~ImageTest() {};
    void moveTiles() {
        static int adv = 5;
            if (images[0][0].x > 128 || images[0][0].x < 0) {
                adv *= -1;
            }
            for (int x=0; x<2; x++) {
                for (int y=0; y<2; y++) {
                    images[y][x].x += adv;
                    images[y][x].y += adv;
                    lv_obj_set_pos(images[y][x].img, images[y][x].x, images[y][x].y);
                }
            }

    };
    void load() {
        for (int x=0; x<2; x++) {
            for (int y=0; y<2; y++) {
               images[y][x].img = load_and_place_tile(images[y][x].fn, images[y][x].x, images[y][x].y);
                if (!images[y][x].img) {
                    printf("ERROR: Unable to load tile: %s\n", images[y][x].fn);
                }

                if (x==0 && y==0) {
                    lv_img_header_t img_header;
                    lv_res_t res = lv_img_decoder_get_info(images[y][x].fn, &img_header);
                    if(res == LV_RES_OK) {
                        printf("****Image INFO successfully!\n");
                        printf("****Image size: %d x %d\n", img_header.w, img_header.h);
                        printf("****Color cf = %lu\n", img_header.cf);
                    } else {
                        printf("Image failed to load.\n");
                    }

                }
            }
        }
    };
protected:
    struct img images[2][2];
};

ImageTest *pImageTest;

Any thoughts on why LVGL might spend so much time rendering the decoded image versus the pre-converted image bits in memory?

One final note - in the last code block, inside load(), I took ONE of the images and did an lv_img_decoder_get_info() on it and I note that the img_header.cf is == 5 which I believe to be TRUE color PLUS ALPHA. I could imagine this being the source of the issue, but the PNG file itself has no alpha. Is there something I need to do “better” or “different” to not have the resulting image from the decoder in that form/format?

bobwolff68 · October 27, 2024, 12:03am

After a few experiments of trying to enable caching and also doing a bit of logging in the decoder’s open() and info() functions, it appears the PNG file test is calling decoder_info() over and over again which causes a fs_open and fs_read over and over again. It seems this is a likely culprit but I’m not certain.

I expected the decoder to be called once and the resulting image would be in memory ready to use for changing a position without any other intense operations, but it appears the decoder is involved at every step. Does this make sense? It seems quite wasteful.

kdschlosser · October 28, 2024, 4:24am

This same kind of an issue also happens with JPG’s as well. It appears like the problem is not decoder specific but something that is more global.

And yes I also notice the very large number of calls that are made to read the same data over and over again. This is not something that is going to give good performance considering reading from the flash is one of the slowest things an MCU does. In the case of the MCU the flash storage shares the same SPI bus as the PSRAM so if there is anything running on the other core of the ESP32 that is accessing the PSRAM the speed in which the data is able to be read from flash is going to be cut in 1/2. If DMA memory access is also being used then the speeds can end up being cut down to one third. reading the same exact data over and over again from the flash memory is anything but ideal especially since the data that is being read can be stored in memory instead. I am referring to the header information specifically.

I do not believe the image is loaded into memory in its entirety I think it loads only chunks of it and decodes the chunk and writes the chunk and this is done each time some part of the image gets invalidated for some reason. Images take up very large amounts of memory which is the reason why it was done this way but when using smaller images like small icons this is not an ideal way to have it run. It should load and decode only a single time in those cases.

I have already mentioned this to the maintainers and perhaps it might get a little more traction that a problem does exist and someone will look into it.

@kisvegabor

kisvegabor · October 28, 2024, 6:38am

Hi,

Probably the problem is either

slow file system
very slow decoding (less likely)

Note that by default the image will be decoded on every drawing, and only once when you set it.

I suggest enabling the image cache. It will save the image after the first decoding. Just make sure that LV_MEM_SIZE is large enough for the images.

bobwolff68 · November 1, 2024, 4:05am

Yeah I’m pretty sure the lack of image cache is causing the re-decode of the header and portions of the image again and again. I’ve come up with a test-bench for doing framerate tests for flash-based 565 images vs in-ram 565 images vs png etc.

I just migrated my setup to LVGL9 so I can be up to date and also use RLE binary encoding and I8 and such as I learn each over the next week or so.