Importing fonts using lv_font_load(）is too long

jlkfdg · June 26, 2023, 11:05am

Important: unclear posts may not receive useful answers.

Before posting

Get familiar with Markdown to format and structure your post

Be sure to update lvgl from the latest version from the master branch.

Be sure you have checked the FAQ and read the relevant part of the documentation.

If applicable use the Simulator to eliminate hardware related issues.

Delete this section if you read and applied the

What MCU/Processor/Board and compiler are you using?

SWM341

What LVGL version are you using?

8.3.7

What do you want to achieve?

I use the 18th Chinese full font library, which contains over 6000 Chinese characters and generates over 3M bin files. I use lv_ Font_ It takes 6 seconds to load and import fonts, which takes too long. Can you use them directly like generating. c, and which fonts are needed to read directly.
I copied all the bin files to SDRAM, changed the function of lv_fs_read() to mencpy(), and copied the sdram data to buf, but it still took 6 seconds.

What have you tried so far?

Code to reproduce

Add a code snippet which can run in the simulator. It should contain only the relevant code that compiles without errors when separated from your main code base.

The code block(s) should be formatted like:

/*You code here*/

Screenshot and/or video

If possible, add screenshots and/or videos about the current state.

kisvegabor · June 26, 2023, 7:06pm

Is it faster if you set some file system cache. E.g. LV_FS_STDIO_CACHE_SIZE 512.

jlkfdg · June 27, 2023, 1:51am

The reason for the slow speed now is that the lv_font_load() parsing is too slow. I am not using the file system now, and I am reading the bin directly into my SDRAM.

Change
lv_fs_res_t lv_fs_read(lv_fs_file_t * file_p, void * buf, uint32_t btr, uint32_t * br)
to the following format：

static uint8_t fs_read(uint32_t file_p, void * buf, uint32_t btr, uint32_t * br)
{
memcpy((uint8_t *)buf,(uint8_t *)(file_p+file_addr),btr);
file_addr += btr;
if(br != NULL)
{
*br = btr;
}
return 0;
}
Both pointer addresses are on SDRAM。
Reading bin files from SDRAM is fast, and the slower part is the parsing part.。

jlkfdg · June 29, 2023, 9:11am

Hello,
can you help take a look? I have no idea where to start.

kisvegabor · June 29, 2023, 7:01pm

Hi,

I see. Can you use a profiler to identify what the slowest parts are?

jlkfdg · June 30, 2023, 1:40am

Hello, I have tested that this function takes the most time, and the data for loca_count is particularly large.

static int32_t load_glyph(lv_fs_file_t * fp, lv_font_fmt_txt_dsc_t * font_dsc,
uint32_t start, uint32_t * glyph_offset, uint32_t loca_count, font_header_bin_t * header)
{
int32_t glyph_length = read_label(fp, start, “glyf”);
if(glyph_length < 0) {
return -1;
}

lv_font_fmt_txt_glyph_dsc_t * glyph_dsc = (lv_font_fmt_txt_glyph_dsc_t *)
                                          lv_mem_alloc(loca_count * sizeof(lv_font_fmt_txt_glyph_dsc_t));

memset(glyph_dsc, 0, loca_count * sizeof(lv_font_fmt_txt_glyph_dsc_t));

font_dsc->glyph_dsc = glyph_dsc;

int cur_bmp_size = 0;

for(unsigned int i = 0; i < loca_count; ++i) {
    lv_font_fmt_txt_glyph_dsc_t * gdsc = &glyph_dsc[i];

    lv_fs_res_t res = lv_fs_seek(fp, start + glyph_offset[i], LV_FS_SEEK_SET);
    if(res != LV_FS_RES_OK) {
        return -1;
    }

    bit_iterator_t bit_it = init_bit_iterator(fp);

    if(header->advance_width_bits == 0) {
        gdsc->adv_w = header->default_advance_width;
    }
    else {
        gdsc->adv_w = read_bits(&bit_it, header->advance_width_bits, &res);
        if(res != LV_FS_RES_OK) {
            return -1;
        }
    }

    if(header->advance_width_format == 0) {
        gdsc->adv_w *= 16;
    }

    gdsc->ofs_x = read_bits_signed(&bit_it, header->xy_bits, &res);
    if(res != LV_FS_RES_OK) {
        return -1;
    }

    gdsc->ofs_y = read_bits_signed(&bit_it, header->xy_bits, &res);
    if(res != LV_FS_RES_OK) {
        return -1;
    }

    gdsc->box_w = read_bits(&bit_it, header->wh_bits, &res);
    if(res != LV_FS_RES_OK) {
        return -1;
    }

    gdsc->box_h = read_bits(&bit_it, header->wh_bits, &res);
    if(res != LV_FS_RES_OK) {
        return -1;
    }

    int nbits = header->advance_width_bits + 2 * header->xy_bits + 2 * header->wh_bits;
    int next_offset = (i < loca_count - 1) ? glyph_offset[i + 1] : (uint32_t)glyph_length;
    int bmp_size = next_offset - glyph_offset[i] - nbits / 8;

    if(i == 0) {
        gdsc->adv_w = 0;
        gdsc->box_w = 0;
        gdsc->box_h = 0;
        gdsc->ofs_x = 0;
        gdsc->ofs_y = 0;
    }

    gdsc->bitmap_index = cur_bmp_size;
    if(gdsc->box_w * gdsc->box_h != 0) {
        cur_bmp_size += bmp_size;
    }
}

uint8_t * glyph_bmp = (uint8_t *)lv_mem_alloc(sizeof(uint8_t) * cur_bmp_size);

font_dsc->glyph_bitmap = glyph_bmp;

cur_bmp_size = 0;

for(unsigned int i = 1; i < loca_count; ++i) {
    lv_fs_res_t res = lv_fs_seek(fp, start + glyph_offset[i], LV_FS_SEEK_SET);
    if(res != LV_FS_RES_OK) {
        return -1;
    }
    bit_iterator_t bit_it = init_bit_iterator(fp);

    int nbits = header->advance_width_bits + 2 * header->xy_bits + 2 * header->wh_bits;

    read_bits(&bit_it, nbits, &res);
    if(res != LV_FS_RES_OK) {
        return -1;
    }

    if(glyph_dsc[i].box_w * glyph_dsc[i].box_h == 0) {
        continue;
    }

    int next_offset = (i < loca_count - 1) ? glyph_offset[i + 1] : (uint32_t)glyph_length;
    int bmp_size = next_offset - glyph_offset[i] - nbits / 8;

    if(nbits % 8 == 0) {  /*Fast path*/
        if(lv_fs_read(fp, &glyph_bmp[cur_bmp_size], bmp_size, NULL) != LV_FS_RES_OK) {
            return -1;
        }
    }
    else {
        for(int k = 0; k < bmp_size - 1; ++k) {
            glyph_bmp[cur_bmp_size + k] = read_bits(&bit_it, 8, &res);
            if(res != LV_FS_RES_OK) {
                return -1;
            }
        }
        glyph_bmp[cur_bmp_size + bmp_size - 1] = read_bits(&bit_it, 8 - nbits % 8, &res);
        if(res != LV_FS_RES_OK) {
            return -1;
        }

        /*The last fragment should be on the MSB but read_bits() will place it to the LSB*/
        glyph_bmp[cur_bmp_size + bmp_size - 1] = glyph_bmp[cur_bmp_size + bmp_size - 1] << (nbits % 8);

    }

    cur_bmp_size += bmp_size;
}
return glyph_length;

}

kisvegabor · July 3, 2023, 8:08am

Based on that, I believe the real bottleneck is read_bits as it runs a loop for each bit.

Probably it can be optimized by creating a mask and AND/OR it only once.

I’m quite busy now, so unfortunately can’t look into it in more detail. Can you make some experiments with it?