How to Find a guide about gpu_fill_cb

AhmadBan · July 27, 2019, 8:09pm

First of all thanks for your great work. I am working on stm32f1 with a 2.8 inch Hx8347G 8bit tft lcd with resistive touch. I could port library with my board in half a day even i did not know anything about lvgl. however, It is very slow in this board so I tried to make it faster by optimising my driver . It got much better but not enough. So I found that my LCD has a feature that can fill a rectangular just by sending start point and width and height and color . I found that you already wrote a callback to implement this feature but i could not find any thing about it. I just find some example without enough explanation. I just need to know how this callback work and what is each argument for.
My plan is to use this callback to refresh my lcd and at the same time refresh my buffer with a dma
Thanks in advance

embeddedt · July 27, 2019, 8:13pm

Actually, that callback is for filling a region of memory (not the display) with a color. I assume you are referring to gpu_fill_cb? EDIT: I should read the title before the question.

AhmadBan · July 27, 2019, 8:15pm

until now I found this. but would you please elaborate this function a little more. need help with arguments in callback. what are these arguments?

 static void gpu_mem_fill(lv_disp_drv_t * disp_drv, lv_color_t * dest_buf, lv_coord_t dest_width,
            const lv_area_t * fill_area, lv_color_t color)

it can be guessed that
dest_buf is the start address of buffer that must be refreshed
dest_width is width of display that must be filled
fill_area ???
color is the color to fill
but is this callback fill a rectangular?
how can i find start x and y coordination of of that rectangular with this arguments?

this function is my gpu function to fill
void fillRect(int16_t x, int16_t y, int16_t w, int16_t h, uint16_t color)

embeddedt · July 27, 2019, 8:22pm

Again, you aren’t going to be able to use that function with your LCD, because (from what you’ve described) the acceleration feature fills the LCD with a color, not memory.

Anyways, here is an example of that function being implemented using the ST ChromART accelerator:

github.com

littlevgl/stm32f746_disco_no_os_sw4stm32/blob/af11a18a2a1ae9c0385f35623801ac78aaf9bffb/hal_stm_lvgl/tft/tft.c#L283-L295


   lv_color_t * dest_buf_ofs = dest_buf;


   dest_buf_ofs += dest_width * fill_area->y1;
   dest_buf_ofs += fill_area->x1;
   lv_coord_t area_w = lv_area_get_width(fill_area);


   uint32_t i;
   for(i = fill_area->y1; i <= fill_area->y2; i++) {
	   /*Wait for the previous operation*/
	   HAL_DMA2D_PollForTransfer(&Dma2dHandle, 100);
	   HAL_DMA2D_BlendingStart(&Dma2dHandle, (uint32_t) lv_color_to32(color), (uint32_t) dest_buf_ofs, (uint32_t)dest_buf_ofs, area_w, 1);
	   dest_buf_ofs += dest_width;
   }

I agree that the documentation isn’t too clear on how the function works.

@kisvegabor Could you could write a more detailed description here, as well as fix the example?

AhmadBan · July 27, 2019, 8:27pm

I already have this example in my IDE but it is confusing
and for your first argument about how to use this feature I said that If I get actual start X and Y and width and height of that rectangle I can draw on LCD as well as I will refresh my buffer with two dma s

embeddedt · July 27, 2019, 8:31pm

That’s not exactly how LittlevGL works. We draw portions of the display at a time so there is no safe way to get the actual coordinates that that rectangle will be at.

AhmadBan · July 27, 2019, 8:37pm

you said that this call back is just do some memory stuff for library and nothing more. so in this case I need to hack draw part

embeddedt · July 27, 2019, 8:39pm

I wouldn’t advise hacking the internal drawing functions. The behavior of that can change between versions, and in any case we can’t provide support for that.

Instead, I would see if you can find a different way of speeding up your driver. What does the slowness look like to the eyes? Is there tearing on the screen, or is it just sluggish?

AhmadBan · July 27, 2019, 8:42pm

It is sluggish I ran button example with styles on it when click . I can see all line of button click style.refresh rate is very low . though when I am not using lvgl it is very fast

embeddedt · July 27, 2019, 8:43pm

Can you get a video? It’s easier to see the problem that way.

AhmadBan · July 27, 2019, 8:43pm

yeah off course

AhmadBan · July 27, 2019, 9:13pm

t_video5902007370829006624 (1).zip (2.8 MB)
this is video about how my lcd working

embeddedt · July 27, 2019, 9:17pm

Thank you so much for the video; now the problem is clear to me.

It looks like you’re having difficulty getting pixels onto the screen fast enough. How are you copying the rendered buffer to the display? Line-by-line or pixel-by-pixel?

AhmadBan · July 27, 2019, 9:26pm

I am copying pixel by pixel actually I have a function in my driver WritePixel and I use this function in my_disp_flush

embeddedt · July 27, 2019, 9:28pm

If you can find a way of copying line-by-line (either with DMA or by optimizing your API to send more pixels in a tight loop) that should help speed things up.

If you show me the driver code, I can take a look from a high level and see if there is anything that can easily be optimized.

AhmadBan · July 27, 2019, 9:35pm

I already thought about dma but this LCD uses 8-bit data interface as well as 4 bytes command to send data to lcd . since all data and command are in different pins it is not possible to use dma or I don’t know any thing for this . by the way

void set_pixel(lv_coord_t x,lv_coord_t y,lv_color_t *color_p){
	writePixel(x,y,color_p->full);
}
void my_disp_flush(lv_disp_t * disp, const lv_area_t * area, lv_color_t * color_p)
{
    int32_t x, y;
    for(y = area->y1; y <= area->y2; y++) {
        for(x = area->x1; x <= area->x2; x++) {
            set_pixel(x, y, color_p);  /* Put a pixel to the display.*/
            color_p++;
        }
    }

    lv_disp_flush_ready(&disp_drv);                  /* Tell you are ready with the flushing*/
}



void WritePixel(int16_t x, int16_t y, uint16_t color)
{
    
    if (x < 0 || y < 0 || x >= width() || y >= height())
        return;
#if defined(SUPPORT_9488_555)
    if (is555)
        color = color565_to_555(color);
#endif
    setAddrWindow(x, y, x, y);
    //    CS_ACTIVE; WriteCmd(_MW); write16(color); CS_IDLE; //-0.01s +98B
    if (is9797)
    {
        CS_ACTIVE;
        WriteCmd(_MW);
        write24(color);
        CS_IDLE;
    }
    else
        WriteCmdData(_MW, color);
}

AhmadBan · July 27, 2019, 9:59pm

I had a film to show you power of built in GPU and How fast it is working compared to without GPU video_2019-07-28_02-23-29.zip (1.6 MB)

maybe if a software layer been wriiten on top of buffer just to find biggest rectangular single color buffer and refresh lcd by fillrect instead of pixel by pixel it will solve everything.
the challenge is an algorithm that see buffer as rectangular pixel :))

embeddedt · July 27, 2019, 10:29pm

Assuming your display controller behaves like the others I’ve used, if you modified WritePixel to set the address window to the constraints of area, you could loop calling WriteCmd and write24 over and over, which would be significantly faster. I don’t know if you need to repeatedly assert/deassert CS or not.

If you don’t understand what I’m suggesting then I will provide more details.

embeddedt · July 27, 2019, 10:30pm

That’s not how LittlevGL works. The whole idea is that a prerendered buffer is provided to you. That is necessary to implement things like alpha blending and shadows.

The main optimization strategy here is to push pixels to the screen as fast as possible.

v_w · July 28, 2019, 9:14am

First of all I am AhmadBan that I reached my limitation in post So I have to make a new account.
please explain more especially this part

if you modified WritePixel to set the address window to the constraints of area

and from my understanding how about modifying flush_cb instead of executing once per one pixel you could call back over an rectangle area of single color including a rectangle of with size 1 ==pixel
and user is responsible to perform it based on their hardware

or actually if i manage to do it my self in flush-cb it will get much faster