Hi,
I made many measurements when implemented the DMA2D support and I found that usually it’s really not faster then LVGL’s software rendering.
DMA2D supports very simple opearations:
- fill an area
- copy an area
- blend two areas with opacity
- and color format conversations during this operations but we don’t use it
For fill area and copy area apparently DMA2D can not work faster than a well-written memset
or memcpy
.
For blending DMA2D could be faster but SW rendering is very well optimized here and has the same performance.
It’s important to note that RGB565 with alpha channel is a custom LVGL format and therefore not supported by DMA2D directly. To use ARGB images with DMA2D ARGB8888 images needs to be used with color format conversation. However, with LV_COLOR_DEPTH 32
and ARGB8888 images DMA2D can used directly (and LVGL really handles this case separately), so it can be slightly faster.
I also saw the TouchGFX videos where they compare cases with and without DMA2D and there is huge difference in performance. I made some tests with similar UIs and LVGL’s SW rendering really has the performance as DMA2D in the videos. So it seems their SW rendering very slow.
An other thing is how CPU usage is measured. STM examples probably count the CPU as idle while DMA2D is working in the background the CPU is waiting for it. It’s not the case for LVGL as it simply counts time spent in lv_task_handler
.
LVGL lets you use the sparse time provided by DMA2D by calling disp_drv->gpu_wait_cb
while DMA2D busy.
There could be other factors that hugely affect the performance. Rendering directly to the external RAM will be very slow. Maybe that’s the case in STM examples by default. However LVGL uses smaller draw buffer(s) typically located in the internal RAM and copy only the final result into the frame buffer in external RAM. It’s way faster.
I hope this summary helps.
PS: Please correct me if I stated something incorrectly about DMA2D.