Remote Viewer and Control library for LVGL Applications

Yes, we know if it’s > 2 bytes per pixel with RLE, there is no benefit, and in fact some harm as the processing also makes it less efficient. you are stating the obvious, and no-one has made claims to the contrary. However, I specifically said its always under 2 in my use cases though, even for images and video and, for UI type screens, it’s usually around 0.4 bytes per pixel, a significant saving. My software client can choose RLE or just send 'em as you brung em, and I time the transfers and data rates, so I can see which is advantageous or not, and generally, I have found with a UI-type screen, the RLE encoding is more efficient on time, despite the extra processing - the 50-75% reduction in bytes needed to be sent makes up for it.

I’m not so sure why you keep beating this double-buffer/DMA point in this context - almost everyone I know using LVGL, and a significant number of users on the forum, barely have enough memory on their MCUs for a partial frame buffer, let alone two full-screen sized buffers with DMA transfer. The author never wrote this for that scenario, nor made that claim, and if you’ve got enough RAM for 2 full screen buffers and have DMA transfer, then you probably don’t need this form of solution. A Formula One driver doesn’t go to his local car parts dealer to get go-faster stripes…

OK it is quite obvious you don’t understand the dynamics of running applications on memory constrained devices and how to write efficient code. I am able to create 2 800 x 480 x 12bpp full frame buffers for use on an RGB display and have them in DMA memory and have no issues with running out of memory. I am able to do this while running MicroPython and have ample memory left over. 2 partial buffers for a 480 x 320 x 16bpp display can fit into the internal RAM on ANY ESP32 without any issue and once again that is when running MicroPython which consumes a decent chunk of memory in order to run. So the memory issuesd you are having is because of how the code is written and not because the MCU doesn’t have enough memory.

Congratulations on doing all that on your device - it is physically impossible on my device, and many, many other devices. 2 x 800 * 480 * 12bpp full frame buffers requires 1,152,000 bytes of memory. That’s more memory than my Teensy 4.1 has, and more memory than many MCUs have.

You do understand that, right? You do understand that not everyone is using the exact same device as you, with the exact same memory specs, etc?

It is nothing to do with my inefficient code, lack of understanding, etc. that means I don’t have enough memory - the device simply wasn’t manufactured with enough for 2 full frame buffers. No need to be insulting, especially as you are showing the world you lack the understanding, thinking that because you can do all that on YOUR device with YOUR physical memory, everyone can, from a Teensy 3.2 and an Arduino Due on up, and if those devices physically don’t have that much on-board memory, the developer is the idiot?!

Tweaks to my client to help with editing/testing a real app…zoom and pan around the live image from streamed via remoteDisplay, add a grid and crosshairs to check alignment, etc! Video captured the streamed screen on the desktop rather than having to video a tiny screen :slight_smile: Thanks again for this!

1 Like

I’m really grateful for all the community contribution. However, I need to emphasize the importance of focusing solely on the technical issues rather than personal remarks. Comments directed at individuals rather than the problem hinder our goal of creating a respectful, collaborative environment.

Let’s ensure this community remains a place where all members feel valued and supported. I trust we can keep our interactions constructive and centered on solutions. Thank you for your cooperation.

2 Likes

Hi guys,

This is interesting to me since I have had a similar approach in our development, and by experience, I know how useful this is. I just wanted to say thanks to @CubeCoders for making this easily available to the community.

The difference in our approach is I made a PWA client using webassembly and lvgl, which is served by the device itself so it is always available. (A screenshot of it can be found here)

For the run-length encoding, I choose the variant in which the value itself is used as an escape value and then it is followed by a 1-byte repeat count, for example:

“ABCCDDDEEEEFFFFFG” → “ABCC2DD3EE4FF5G”

The reason for choosing this is the fact that non-repeated values are used as they are so no need for an extra byte, switching characters for 3-byte pixel values, in the worst case where each of two adjacent pixels is equal, the result is 7 bytes instead of 6 bytes, so 16% extra bandwidth usage, and in the best case 7 bytes instead of 765 bytes, so 99% savings in the bandwidth.

As said this is dependent on how complicated the scene is but in our project, I’ve seen ~85% compression in general and even 20% compression when displaying a 320x240 gif.

1 Like

Do you have shareable source for that type of encoding (and decoding) on an LVGL buffer? If not, I can figure it out. I want to give it a whirl, I can see it being beneficial in some cases.

“switching characters for 3-byte pixel values” - I assume this means you are sending RGB, not 16 bit values? So the decode method is - if I see two sequential identical packets of 3 bytes, the next byte is the run length?

Yes, I’m using 32-bit color, but it’s easy to adapt it to 16-bit color, I’ll copy my implementation below:

// encoded is assumed to be large enough for the worst-case scenario, so: width * height * 3 * (7 / 6)
void encode(uint8_t *encoded, size_t *encoded_size, const uint8_t *buffer, const size_t size)
{
    const uint8_t *buffer_end = buffer + size;
    const uint8_t *encoded_begin = encoded;

    while (buffer <= buffer_end - 3)
    {
        uint8_t repeats = 1;

        while (buffer <= buffer_end - 6)
        {
            if (repeats == 255 ||
                buffer[0] != buffer[3] ||
                buffer[1] != buffer[4] ||
                buffer[2] != buffer[5])
                break;

            repeats++;
            buffer += 3;
        }

        if (repeats == 1)
        {
            encoded[0] = buffer[0];
            encoded[1] = buffer[1];
            encoded[2] = buffer[2];
        }
        else
        {
            encoded[0] = buffer[0];
            encoded[1] = buffer[1];
            encoded[2] = buffer[2];
            encoded[3] = buffer[0];
            encoded[4] = buffer[1];
            encoded[5] = buffer[2];
            encoded[6] = repeats;
        }

        encoded += (repeats == 1 ? 3 : 7);
        buffer += 3;
    }

    assert(buffer == buffer_end);

    if (encoded_size)
        *encoded_size = encoded - encoded_begin;
}

// decoded_size is bytes we expect to decode out of encoded, so: width * height * 3
void decode(uint8_t *decoded, const uint32_t decoded_size, const uint8_t *encoded, const uint32_t encoded_size)
{
    const uint8_t *encoded_end = encoded + encoded_size;
    const uint8_t *decoded_end = decoded + decoded_size;

    while (encoded <= encoded_end - 3)
    {
        uint32_t repeats = 1;

        if (encoded_end - encoded > 6 &&
            encoded[0] == encoded[3] &&
            encoded[1] == encoded[4] &&
            encoded[2] == encoded[5])
            repeats = encoded[6];

        if (decoded + (3 * repeats) <= decoded_end)
            for (uint32_t i = 0; i < repeats; i++)
            {
                decoded[0] = encoded[0];
                decoded[1] = encoded[1];
                decoded[2] = encoded[2];

                decoded += 3;
            }

        encoded += (repeats == 1 ? 3 : 7);
    }

    assert(encoded == encoded_end);
    assert(decoded == decoded_end);
};

Thanks, I already got most of the way there, I’d tweaked the OPs code to use pointers instead of indices, so needed a bit of adjustment due to varying size of data entries from fixed 4 bytes to either 2 or 6 bytes (I’m using 16 bit pointers which worked well for both 16 bit color and 16 bit runlength value, so more changes needed)… but even so, with my initial tweak to use escaped colors and a 16 bit runlength, the bandwidth sent has dropped in most use cases. I’ll check the average runlength I’m seeing in various scenarios, to see if I need 16 bit or 8 bit values for that…

Thanks!

Oh, I forgot to mention that before encoding I was converting to 24-bit color by discarding alpha channel values, sorry about that, in my configuration LVGL is set to work with 32-bit color, while in the client my canvas is set to have LV_COLOR_FORMAT_RGB888 color format.

I also included a 6-bit color implementation, as expected, it doesn’t look great but it made things useable with slow wifi, it also plays nice with the RLE implementation because it is as same as trying to encode a run-length of 4 pixels instead of 1.

// 32-bit to 24-bit, discarding alpha channel, technically not a compression I guess.
static void compress_24bpp(const uint8_t *buffer, const size_t size,
                           uint8_t *compressed, size_t *compressed_size)
{
    size_t i_out = 0;

    for (size_t i_in = 0; i_in < size; i_in++)
    {
        if ((i_in % 4) == 3)
            continue;

        compressed[i_out++] = buffer[i_in];
    }

    if (compressed_size)
        *compressed_size = i_out;
}

// 32-bit to 6-bit, discarding alpha channel, undersampling 24-bit to 6-bit & packing.
// 0xRR, 0xGG, 0xBB, 0xAA, ... -> 0bRRGGBBRR, 0bGGBBRRGG, 0bBBRRGGBB, ...
void compress_6bpp(const uint8_t *buffer, const size_t size,
                          uint8_t *compressed, size_t *compressed_size)
{
    size_t shift = 0;
    uint8_t byte = 0;
    size_t i_out = 0;

    for (size_t i_in = 0; i_in < size; i_in++)
    {
        if ((i_in % 4) == 3)
            continue;

        byte |= ((buffer[i_in] / 85) << (6 - shift));

        shift += 2;

        if (shift == 8 || i_in == size - 1)
        {
            compressed[i_out++] = byte;

            byte = 0;
            shift = 0;
        }
    }

    if (compressed_size)
        *compressed_size = i_out;
}

// 6-bit to 24-bit, 
// 0bRRGGBBRR, 0bGGBBRRGG, 0bBBRRGGBB, ... -> 0xRR, 0xGG, 0xBB, 0xRR, ...
void decompress_6bpp(uint8_t *decompressed, const size_t decompressed_size, const uint8_t *compressed, const uint32_t compressed_size)
{
    const uint8_t mask[] = {
        0b11000000,
        0b00110000,
        0b00001100,
        0b00000011,
    };

    size_t i_out = 0;

    for (size_t i_in = 0; i_in < compressed_size; i_in++)
        for (size_t shift = 0; shift < 4 && i_out < decompressed_size; shift++)
            decompressed[i_out++] = ((compressed[i_in] & mask[shift]) >> (6 - (2 * shift))) * 85;
}

No problem, I’d be glad to help!

1 Like

Thanks, this was very useful! Implemented it for 16 bit color / 8 bit runlength and tested it with this screen streaming from my Teensy 4.1 to desktop:

It’s playing video thumbnails at around 20 - 21fps and streaming it (effective 37fps without streaming over the network), the thumbnails take up approximately 45% of the screen area, and looks like escaped RLE compresses it around 30%, which isn’t too bad at all, given this is the worst case scenario I have. ‘Regular’ UI screens compress approx. 80% to 90%. Thanks again!

1 Like

You’re welcome, the results are looking pretty cool, good luck! :+1:

1 Like