Real-time dithering from 24 to 16 bit for beautiful displays

I’ve developed a method to perform 24->16 bit colour conversion using Floyd-Steinberg dithering - the result is great looking user interfaces with significantly less visible banding on 16-bit displays without having to re-work all of your assets.

My approach is zero-allocation (has literally 0 memory overhead) and operates in-place within the 24 bit buffer. It’s decently fast but does have some overhead. The impact it has of course will depend on the size of your update regions.

First you simply need to set:

/*Color depth: 1 (I1), 8 (L8), 16 (RGB565), 24 (RGB888), 32 (XRGB8888)*/
#define LV_COLOR_DEPTH 24

In your lv_conf.h file.

In my case, I’m using the popular TFT_eSPI library as a backend for LVGL. All I have to do is add a single function call to my existing disp_flush(...) method:

void disp_flush(lv_display_t* disp, const lv_area_t* area, uint8_t* pixelmap) {
    uint32_t w = (area->x2 - area->x1 + 1);
    uint32_t h = (area->y2 - area->y1 + 1);
    apply_dithering(pixelmap, w, h);
    tft.startWrite();
    tft.setAddrWindow(area->x1, area->y1, w, h);
    tft.pushPixels(pixelmap, w * h);  // Use the converted buffer
    tft.endWrite();
    
    lv_disp_flush_ready(disp);
}

And then our implementation of apply_dithering uses the buffer in-place with some pointer casting magic.

void apply_dithering(uint8_t* pixelmap, uint32_t w, uint32_t h) {
	uint16_t* pixelmap16 = (uint16_t*)pixelmap;
	
    for (uint32_t y = 0; y < h; y++) {
        for (uint32_t x = 0; x < w; x++) {
            // Get the index of the current RGB888 pixel
            uint32_t index = (y * w + x) * 3;
            
            uint8_t r = pixelmap[index];
            uint8_t g = pixelmap[index + 1];
            uint8_t b = pixelmap[index + 2];

            // Convert to BGR565
			uint16_t outColor = tft.color565(b,g,r);

            // Store the BGR565 value in the output buffer (2 bytes per pixel)
            pixelmap16[y * w + x] = outColor;

            // Calculate the quantization error
            int err_r = r - ((outColor & 0x1F) * 255 / 31);  // Red in BGR is in the least significant 5 bits
            int err_g = g - (((outColor >> 5) & 0x3F) * 255 / 63); // Green remains in the middle 6 bits
            int err_b = b - (((outColor >> 11) & 0x1F) * 255 / 31); // Blue is now in the most significant 5 bits

            // Distribute the error to neighboring pixels (Floyd-Steinberg)
            if (x + 1 < w) {
                // Right neighbor (x+1, y)
                uint32_t nextIndex = ((y * w + x + 1) * 3);
                pixelmap[nextIndex] = clip(pixelmap[nextIndex] + (err_r * 7 / 16));
                pixelmap[nextIndex + 1] = clip(pixelmap[nextIndex + 1] + (err_g * 7 / 16));
                pixelmap[nextIndex + 2] = clip(pixelmap[nextIndex + 2] + (err_b * 7 / 16));
            }
            if (y + 1 < h) {
                if (x > 0) {
                    // Bottom-left neighbor (x-1, y+1)
                    uint32_t nextIndex = (((y + 1) * w + (x - 1)) * 3);
                    pixelmap[nextIndex] = clip(pixelmap[nextIndex] + (err_r * 3 / 16));
                    pixelmap[nextIndex + 1] = clip(pixelmap[nextIndex + 1] + (err_g * 3 / 16));
                    pixelmap[nextIndex + 2] = clip(pixelmap[nextIndex + 2] + (err_b * 3 / 16));
                }
                // Bottom neighbor (x, y+1)
                uint32_t nextIndex = (((y + 1) * w + x) * 3);
                pixelmap[nextIndex] = clip(pixelmap[nextIndex] + (err_r * 5 / 16));
                pixelmap[nextIndex + 1] = clip(pixelmap[nextIndex + 1] + (err_g * 5 / 16));
                pixelmap[nextIndex + 2] = clip(pixelmap[nextIndex + 2] + (err_b * 5 / 16));

                if (x + 1 < w) {
                    // Bottom-right neighbor (x+1, y+1)
                    uint32_t nextIndex = (((y + 1) * w + (x + 1)) * 3);
                    pixelmap[nextIndex] = clip(pixelmap[nextIndex] + (err_r * 1 / 16));
                    pixelmap[nextIndex + 1] = clip(pixelmap[nextIndex + 1] + (err_g * 1 / 16));
                    pixelmap[nextIndex + 2] = clip(pixelmap[nextIndex + 2] + (err_b * 1 / 16));
                }
            }
        }
    }
}
1 Like

The only time you usually see the banding is when using a gradient. Is that what you are using for the background?

If it is instead of having a correction done in the flush callback which is going to severely impact rendering performance. It is better to use the canvas widget and render the gradient background to that. Then you collect the buffer from the canvas widget and destroy the widget. Then run the buffer data though a function like this which done an in place dither of the RGB buffer data. Then you use an lv_image widget and set the buffer data to that. What I do in situations like this is I create the lv_image and change the border and outline widths to zero set the padding and margin to zero and the color opacity to zero as well. I add the buffer to the image widget and then add whatever kind of a control I want with the parent to that control being the image widget. what is nice about doing this is the dither only ends up running a single time.m the dither is also performed on RGB888 data and it converts the RGB888 to dithered RGB565 so you end up with a smoother transition of the colors.

here is the code that does the dither. What is this is it dithers the color and then converts it to a 3 byte RGB565 so when it gets processed down to 2 bytes by LVGL it will remain unchanged.

uint8_t RED_THRESH[] = {
  1, 7, 3, 5, 0, 8, 2, 6,
  7, 1, 5, 3, 8, 0, 6, 2,
  3, 5, 0, 8, 2, 6, 1, 7,
  5, 3, 8, 0, 6, 2, 7, 1,
  0, 8, 2, 6, 1, 7, 3, 5,
  8, 0, 6, 2, 7, 1, 5, 3,
  2, 6, 1, 7, 3, 5, 0, 8,
  6, 2, 7, 1, 5, 3, 8, 0
};

uint8_t GREEN_THRESH[] = {
  1, 3, 2, 2, 3, 1, 2, 2,
  2, 2, 0, 4, 2, 2, 4, 0,
  3, 1, 2, 2, 1, 3, 2, 2,
  2, 2, 4, 0, 2, 2, 0, 4,
  1, 3, 2, 2, 3, 1, 2, 2,
  2, 2, 0, 4, 2, 2, 4, 0,
  3, 1, 2, 2, 1, 3, 2, 2,
  2, 2, 4, 0, 2, 2, 0, 4
};

uint8_t BLUE_THRESH[] = {
  5, 3, 8, 0, 6, 2, 7, 1,
  3, 5, 0, 8, 2, 6, 1, 7,
  8, 0, 6, 2, 7, 1, 5, 3,
  0, 8, 2, 6, 1, 7, 3, 5,
  6, 2, 7, 1, 5, 3, 8, 0,
  2, 6, 1, 7, 3, 5, 0, 8,
  7, 1, 5, 3, 8, 0, 6, 2,
  1, 7, 3, 5, 0, 8, 2, 6
};


uint8_t closest_rb(uint8_t c)
{
    return c >> 3 << 3;
}


uint8_t closest_g(uint8_t c)
{
    return c >> 2 << 2;
}


void rgb565_dither_pixel(uint16_t x, uint16_t y, lv_color_t *col)
{
    uint8_t threshold_id = (uint8_t)(((y & 7) << 3) + (x & 7));
    col->red = closest_rb(((col->red & 0xF8) + RED_THRESH[threshold_id]) & 0xFF) & 0xF8;
    col->green = closest_g(((col->green & 0xFC) + GREEN_THRESH[threshold_id]) & 0xFF) & 0xFC;
    col->blue = closest_rb(((col->blue & 0xF8) + BLUE_THRESH[threshold_id]) & 0xFF) & 0xF8;
}


void lv_rgb565_dither(uint8_t *buf, uint16_t width, uint16_t height, lv_color_format_t format) {
    lv_color_t *color;
    uint8_t *p;
    for (uint16_t y=0; y < height; y++) {
        for (uint16_t x=0; x < width; x++) {
            if (format == LV_COLOR_FORMAT_RGB888) {
                p = &buf[height * y + x];
                color = (lv_color_t *)p;
            } else if ((format == LV_COLOR_FORMAT_ARGB8888) || (format == LV_COLOR_FORMAT_XRGB8888)) {
                p = &buf[height * y + x + 1];
                color = (lv_color_t *)p;
            } else {
                continue;
            }
            rgb565_dither_pixel(x, y, color);
        }
    }
}

The reason I went with this approach is it works for all content, regardless of whether it’s a gradient or something else. On my ESP32S3 it isn’t even particularly slow since only refreshed areas of the screen get it calculated anyway. Although your algorithm is going to be faster because it doesn’t have to take its neighbours into account - I might check it out and see how the actual performance compares and how it looks visually.

That approach is only going to work on RGB888 so if you setting the color depth in LVGL to RGB565 the buffer that is being passed to the flush function is going to be 2 byte RGB565. You would need to tell LVGL to use RGB888 and then create a buffer to move the pixel data into that you would send to the display