HW acceleration: buffer stride and transform list

LVGL start to appear on more powerfull platforms that often contains various accelerators for drawing, or image manipulation in general. From simple 2D DMA-like BLIT engines up to VG or even 3D GPU. Many of them require buffer start address to be aligned on certain boundary, as well as buffer width and height. Currently the stride concept is not supported in LVGL. Is this feature planned in near future? It seems like this is more complex change in buffer handling that will impact all SW rendering algorithm. Change itself is straight-forward, but the impacted area si very wide.

Another heplfull feature to be added is to have complete buffer description accessible in low level drawing layers. This can be used for HW to do the rotation, recoloring, blending, color keying etc. Is this feature planned? Currently the image header and transform description is available at the top of the stack only, it is not propagated down through the SW render engine.

Is it not possible to copy the buffer provided by LVGL to a properly-aligned address first? I think some platforms use special, dedicated DMA buffers for transactions like these.

It is already possible to have buffer at aligned address (at least images with LV_ATTRIBUTE_MEM_ALIGN, I’m not sure about the buffers allocated at runtime), so this is not a problem. Copying data will take time and RAM as-well. The idea is about having buffer of 128 px width, with information that only 100 px is usable data. 100 px is real width, 128 is stride. Memory is wasted here, but some HW can’t avoid this. So, a generic solution would use " buffer width alignment" as an option, e.g.:

// disp_buf_first += disp_w;
// map_buf_first += map_w;
disp_buf_first += disp_stride;
map_buf_first += map_stride;
. . . 
const lv_img_dsc_t img_cogwheel_rgb = {
  .header.always_zero = 0,
  .header.w = 100, // <--- real image width 
  .header.stride =  112, // <--- width aligned to 16 px 
  .header.h = 100,
//  .data_size = 10000 * LV_COLOR_SIZE / 8, 
  .data_size = stride * height * LV_COLOR_SIZE / 8, // <--- size takes stride
  .header.cf = LV_IMG_CF_TRUE_COLOR,
  .data = img_cogwheel_rgb_map,

For platforms not using any special HW, alignment can be 1, equal to real width. For platforms with special HW, alignment can be set according requirements. With this, any buffer can be accessed correctly by special HW as well as SW rendering algorithms.

I think it is worth to mention that even software rendering algorithms do not always like unaligned memory buffers. E.g. on some system simple memcpy() may cause hard fault when used to copy unaligned buffer.
Maybe LVGL already takes this into account. Please confirm.

By default we use a custom memcpy implementation that should correctly handle unaligned buffers.

I have seen a case when it did not. The problem seems to be known.

Were you using 7.0+ when this happened? 6.1 did not include this version of memcpy.

I am talking about memcpy() in general, e.g. used in my_disp_flush() function.
I run into hard faults testing this function when buffer was unaligned (area width was odd).

Anyway, using DMA2D for buffer transfer with indexed colors requires aligned buffer.
Is there any way to make sure the buffer is aligned, that is, area width passed to my_disp_flush() function is a multiple of four?

On STM32 system I want to use DMA2D since it can work ‘in the background’ and cares not to transfer a region while it is being scanned which eliminates flickering.

I’ve never personally needed to use it, but I think rounder_cb could be used for this purpose. You can increase the area size as needed (don’t decrease it, though).

This is the example from the documentation for ensuring that the area is always 8 pixels tall:

void my_rounder_cb(lv_disp_drv_t * disp_drv, lv_area_t * area)
  /* Update the areas as needed. Can be only larger.
   * For example to always have lines 8 px height:*/
   area->y1 = area->y1 & 0x07;
   area->y2 = (area->y2 & 0x07) + 8;
1 Like

thank you for reminding this CB. From quick look in code, I guess this may help for final flush operation, but it will not solve cases like:

lv_gpu_stm32_dma2d_copy(disp_buf_first, disp_w, map_buf_first, map_w, draw_area_w, draw_area_h);
lv_gpu_stm32_dma2d_fill_mask(disp_buf_first, disp_w, color, mask, opa, draw_area_w, draw_area_h);

where map_w and disp_w need to be aligned. If refreshed area will be rounded e.g. from 110x110 to 128x128, background fill will be satisfied with align of 128, but subsequent redraw of the object above will request copy for 110x110 anyway (e.g. image object).

@embeddedt Probably I didn’t make my meaning clear.
In order to:

  • use index colors with dma2d on stm32
  • or safely use memcpy() inside flush_cb function while keeping the code simple

it is desirable that area width is a multiple of four. The passed lv_color_t array buffer (color_p) must have a size described by the area width and height, hence any width rounding alone is not a viable solution.

void my_disp_flush(lv_disp_drv_t* drv, const lv_area_t* area, lv_color_t* color_p) {
    uint32_t width = lv_area_get_width(area);
    MBED_ASSERT(width % 4 == 0);
    // ...

I am not the most familiar with this part of LVGL as I’ve never had this problem in any of my projects. Did you try rounder_cb? Does it improve the situation/not change anything?

@embeddedt Please disregard what I wrote. It looks I did not know what rounder_cb was meant to be used for and it is exactly what I needed.

1 Like

Hello @tdjastrzebski, thank you for confirming this. Do you use images in your app? Image bitmaps that are stored in c-array?


Sorry for joining so late.

One question to stride and rounding. Imagine this case

  1. A 32x32 (stride=32 too) image is at (5;5) coordinate. So the bounding box is at (5;5) (36;36)
  2. Only 1 pixel is invalidated at (20;20)
  3. It rounded to an area (16;16) (31;31)
  4. In the image coordinates it’s (11;11)(26;26) (because the image was at (5;5))

So finally regardless of the stride and rounding still an unaligned image part is refreshed.
Am I seeing this correctly?

yes, i think you’re right. But usually you can avoid this by pushing offsets into HW from the start. So you don’t setup buffer start address at 11,11, and copy 0,0-15,15 area, but you setup buffer start address 0x0 (which is sure that it is aligned), and copy area 11,11 - 26,26. Same apply also for destination - you don’t provide start address of 0,0 of invalidated “child”, but you provide start of complete frame buffer 0,0 (again, known it’s aligned) and copy to destination area e.g. 13,13 - 29,29. The working area usualy don’t have any constraints or requirements. Only buffer start and stride are sometimes constrained.

I’ll try to have a detailed look at rounder_cb again.

Got it, thank you for the explanation!