Hi
Just getting started with this stack, was trying it in an STM32F746G-DISCO, used the GitHub project and updated the examples and LVGL for the last version (Master Branch).
Everithing seems to be working, just have an issue while running the benchmark, that i don’t know if it is an issue with LVGL or with the LCD driver.
In some of the tests (with borders and rectangles) there where some discontinuities.
Check the image
Also made a video, but no ideia how to share, it says i cannot post links
youtu.be/-1Re1BKg5FE
Is this normal?
The “Widgets demo” seems to run whitouth issue.
One other thing, it could be useful to add a comment next to the “LV_USE_GPU_STM32_DMA2D” that the user need to enable the “DMA2D” clock.
Thanks
Those are normal; it’s testing how long it takes to draw large shadows.
By discontinuity do you mean the “broken” grayish rectangle in the middle?
If you do you see it only for a moment or during the whole animation?
Hi
I see it during the tests with “rectangles” and the “border” tests.
In the remaining tests i don’t notice it.
I have attached another image, quite tricky to capture the problem with a photo. In some situations there are more than one discontinuity.
Yes; this is an instance of tearing. It’s caused by the display trying to keep up with all the moving objects. If you have cache enabled it’s not as obvious as without.
Hi @embeddedt
When you mention cache, do you mean “Data Cache” and “instruction cache” in the CPU (Cortex-M7)?
In this case i have “instruction cache” enabled, normally i don’t enable the “Data cache” too many headaches to solve all the possible issues with DMA use.
You will also need data cache on to solve this problem.
Flushing the cache before starting a DMA transaction should be enough for LittlevGL. The example project for this board uses D-Cache by default.
Just tried with the data cache enabled, i had disable it since normally never use it, i use a lot of DMA transfers (Usart RX&TX, SPI for external memory access, SPI for WS2812 control, ADC) and want to avoid some issues.
The performance boost is quite significant in this board. In this case the LV_COLOR was set to 32
, and using Double Buffer (34 rows), with LV_USE_GPU_STM32_DMA2D
enabled.
But the tearing
was still noticeable, less than before, but still noticeable in the rectangle
tests. But is a no issue, probably the only real workaround would be to have a real “double buffer” in the external RAM for the full Display contents, and switch the buffer in use with the VSYNC signal.
Test |
Only I Cache |
D Cache and I Cache |
Weighted FPS |
128 |
183 |
Opa. speed |
93 |
94 |
Rectangle |
193 |
238 |
Rectangle + opa |
114 |
107 |
Rectangle rounded |
110 |
152 |
Rectangle rounded + opa |
75 |
86 |
Circle |
50 |
77 |
Circle + opa |
25 |
45 |
Border |
164 |
200 |
Border + opa |
151 |
193 |
Border rounded |
119 |
162 |
Border rounded + opa |
107 |
150 |
Circle border |
38 |
58 |
Circle border + opa |
34 |
55 |
Border top |
144 |
194 |
Border top + opa |
139 |
194 |
Border left |
58 |
82 |
Border left + opa |
58 |
82 |
Border top + left |
52 |
76 |
Border top + left + opa |
41 |
66 |
Border left + right |
52 |
76 |
Border left + right + opa |
41 |
66 |
Border top + bottom |
126 |
171 |
Border top + bottom + opa |
113 |
164 |
Shadow small |
26 |
46 |
Shadow small + opa |
26 |
46 |
Shadow small offset |
16 |
26 |
Shadow small offset + opa |
15 |
26 |
Shadow large |
11 |
21 |
Shadow large + opa |
10 |
21 |
Shadow large offset |
9 |
17 |
Shadow large offset + opa |
9 |
17 |
Image RGB |
727 |
761 |
Image RGB + opa |
615 |
820 |
Image ARGB |
543 |
800 |
Image ARGB + opa |
592 |
800 |
Image chorma keyed |
137 |
309 |
Image chorma keyed + opa |
96 |
234 |
Image indexed |
65 |
162 |
Image indexed + opa |
57 |
137 |
Image alpha only |
88 |
163 |
Image alpha only + opa |
75 |
144 |
Image RGB recolor |
82 |
189 |
Image RGB recolor + opa |
82 |
188 |
Image ARGB recolor |
87 |
235 |
Image ARGB recolor + opa |
75 |
189 |
Image chorma keyed recolor |
96 |
236 |
Image chorma keyed recolor + opa |
82 |
189 |
Image indexed recolor |
57 |
137 |
Image indexed recolor + opa |
51 |
121 |
Image RGB rotate |
41 |
138 |
Image RGB rotate + opa |
34 |
98 |
Image RGB rotate anti aliased |
14 |
39 |
Image RGB rotate anti aliased + opa |
13 |
35 |
Image ARGB rotate |
40 |
123 |
Image ARGB rotate + opa |
36 |
109 |
Image ARGB rotate anti aliased |
13 |
36 |
Image ARGB rotate anti aliased + opa |
13 |
36 |
Image RGB zoom |
46 |
159 |
Image RGB zoom + opa |
39 |
122 |
Image RGB zoom anti aliased |
16 |
43 |
Image RGB zoom anti aliased + opa |
14 |
39 |
Image ARGB zoom |
46 |
160 |
Image ARGB zoom + opa |
42 |
139 |
Image ARGB zoom anti aliased |
15 |
41 |
Image ARGB zoom anti aliased + opa |
14 |
41 |
Text small |
28 |
53 |
Text small + opa |
28 |
53 |
Text medium |
28 |
53 |
Text medium + opa |
28 |
51 |
Text large |
29 |
54 |
Text large + opa |
28 |
53 |
Text small compressed |
20 |
39 |
Text small compressed + opa |
20 |
38 |
Text medium compressed |
14 |
29 |
Text medium compressed + opa |
15 |
29 |
Text large compressed |
7 |
16 |
Text large compressed + opa |
7 |
15 |
Line |
59 |
90 |
Line + opa |
57 |
87 |
Arc think |
52 |
72 |
Arc think + opa |
48 |
70 |
Arc thick |
52 |
71 |
Arc thick + opa |
46 |
68 |
Substr. rectangle |
58 |
82 |
Substr. rectangle + opa |
59 |
82 |
Substr. border |
100 |
139 |
Substr. border + opa |
100 |
140 |
Substr. shadow |
13 |
22 |
Substr. shadow + opa |
13 |
22 |
Substr. image |
551 |
780 |
Substr. image + opa |
607 |
800 |
Substr. line |
54 |
84 |
Substr. line + opa |
54 |
85 |
Substr. arc |
45 |
64 |
Substr. arc + opa |
45 |
64 |
Substr. text |
24 |
46 |
Substr. text + opa |
24 |
46 |
Thanks for publishing your benchmark results! It looks like on average the cache gives a 1.5x-3x improvement in speed.
The average gain was around 1.9x for all tests.
Curious that the only test that lost performance was Rectangle + opa
.
Now let’s see what the STM32F469 can do, that is the main board i use for Display projects. I will post the benchmark results for comparison.
I’m guessing that that’s because of measurement error. In any case, most people probably don’t care about the exact theoretical maximum as long as it’s above 45.
@arturv2000 Thanks for sharing your results! That’s pretty fast.
In some cases, tearing can be avoided even without double buffering. If the rendering is very fast (10-15 ms in your case) and VSYNC period is slower (e.g. 20 ms) you can synchronize LVGL to start rendering right after VSYNC. This synchronization is not directly supported now but you can easily add it here. Just add a while(vsync_happened)
.