STM32F746Disco Issue

arturv2000 · June 6, 2020, 5:58pm

Hi

Just getting started with this stack, was trying it in an STM32F746G-DISCO, used the GitHub project and updated the examples and LVGL for the last version (Master Branch).

Everithing seems to be working, just have an issue while running the benchmark, that i don’t know if it is an issue with LVGL or with the LCD driver.

In some of the tests (with borders and rectangles) there where some discontinuities.

Check the image

Also made a video, but no ideia how to share, it says i cannot post links
youtu.be/-1Re1BKg5FE

Is this normal?

The “Widgets demo” seems to run whitouth issue.

One other thing, it could be useful to add a comment next to the “LV_USE_GPU_STM32_DMA2D” that the user need to enable the “DMA2D” clock.

Thanks

embeddedt · June 6, 2020, 6:08pm

Those are normal; it’s testing how long it takes to draw large shadows.

arturv2000 · June 6, 2020, 6:40pm

Hi

Thanks for the answer.

kisvegabor · June 8, 2020, 10:28am

By discontinuity do you mean the “broken” grayish rectangle in the middle?
If you do you see it only for a moment or during the whole animation?

kisvegabor · June 8, 2020, 10:28am

Thanks, will do!

arturv2000 · June 8, 2020, 2:45pm

Hi

I see it during the tests with “rectangles” and the “border” tests.

In the remaining tests i don’t notice it.

I have attached another image, quite tricky to capture the problem with a photo. In some situations there are more than one discontinuity.

embeddedt · June 8, 2020, 2:46pm

Yes; this is an instance of tearing. It’s caused by the display trying to keep up with all the moving objects. If you have cache enabled it’s not as obvious as without.

arturv2000 · June 8, 2020, 3:02pm

Hi @embeddedt

When you mention cache, do you mean “Data Cache” and “instruction cache” in the CPU (Cortex-M7)?

In this case i have “instruction cache” enabled, normally i don’t enable the “Data cache” too many headaches to solve all the possible issues with DMA use.

embeddedt · June 8, 2020, 3:14pm

You will also need data cache on to solve this problem.

Flushing the cache before starting a DMA transaction should be enough for LittlevGL. The example project for this board uses D-Cache by default.

arturv2000 · June 8, 2020, 3:47pm

Just tried with the data cache enabled, i had disable it since normally never use it, i use a lot of DMA transfers (Usart RX&TX, SPI for external memory access, SPI for WS2812 control, ADC) and want to avoid some issues.

The performance boost is quite significant in this board. In this case the LV_COLOR was set to 32, and using Double Buffer (34 rows), with LV_USE_GPU_STM32_DMA2D enabled.

But the tearing was still noticeable, less than before, but still noticeable in the rectangle tests. But is a no issue, probably the only real workaround would be to have a real “double buffer” in the external RAM for the full Display contents, and switch the buffer in use with the VSYNC signal.

Test	Only I Cache	D Cache and I Cache
Weighted FPS	128	183
Opa. speed	93	94
Rectangle	193	238
Rectangle + opa	114	107
Rectangle rounded	110	152
Rectangle rounded + opa	75	86
Circle	50	77
Circle + opa	25	45
Border	164	200
Border + opa	151	193
Border rounded	119	162
Border rounded + opa	107	150
Circle border	38	58
Circle border + opa	34	55
Border top	144	194
Border top + opa	139	194
Border left	58	82
Border left + opa	58	82
Border top + left	52	76
Border top + left + opa	41	66
Border left + right	52	76
Border left + right + opa	41	66
Border top + bottom	126	171
Border top + bottom + opa	113	164
Shadow small	26	46
Shadow small + opa	26	46
Shadow small offset	16	26
Shadow small offset + opa	15	26
Shadow large	11	21
Shadow large + opa	10	21
Shadow large offset	9	17
Shadow large offset + opa	9	17
Image RGB	727	761
Image RGB + opa	615	820
Image ARGB	543	800
Image ARGB + opa	592	800
Image chorma keyed	137	309
Image chorma keyed + opa	96	234
Image indexed	65	162
Image indexed + opa	57	137
Image alpha only	88	163
Image alpha only + opa	75	144
Image RGB recolor	82	189
Image RGB recolor + opa	82	188
Image ARGB recolor	87	235
Image ARGB recolor + opa	75	189
Image chorma keyed recolor	96	236
Image chorma keyed recolor + opa	82	189
Image indexed recolor	57	137
Image indexed recolor + opa	51	121
Image RGB rotate	41	138
Image RGB rotate + opa	34	98
Image RGB rotate anti aliased	14	39
Image RGB rotate anti aliased + opa	13	35
Image ARGB rotate	40	123
Image ARGB rotate + opa	36	109
Image ARGB rotate anti aliased	13	36
Image ARGB rotate anti aliased + opa	13	36
Image RGB zoom	46	159
Image RGB zoom + opa	39	122
Image RGB zoom anti aliased	16	43
Image RGB zoom anti aliased + opa	14	39
Image ARGB zoom	46	160
Image ARGB zoom + opa	42	139
Image ARGB zoom anti aliased	15	41
Image ARGB zoom anti aliased + opa	14	41
Text small	28	53
Text small + opa	28	53
Text medium	28	53
Text medium + opa	28	51
Text large	29	54
Text large + opa	28	53
Text small compressed	20	39
Text small compressed + opa	20	38
Text medium compressed	14	29
Text medium compressed + opa	15	29
Text large compressed	7	16
Text large compressed + opa	7	15
Line	59	90
Line + opa	57	87
Arc think	52	72
Arc think + opa	48	70
Arc thick	52	71
Arc thick + opa	46	68
Substr. rectangle	58	82
Substr. rectangle + opa	59	82
Substr. border	100	139
Substr. border + opa	100	140
Substr. shadow	13	22
Substr. shadow + opa	13	22
Substr. image	551	780
Substr. image + opa	607	800
Substr. line	54	84
Substr. line + opa	54	85
Substr. arc	45	64
Substr. arc + opa	45	64
Substr. text	24	46
Substr. text + opa	24	46

embeddedt · June 8, 2020, 4:15pm

Thanks for publishing your benchmark results! It looks like on average the cache gives a 1.5x-3x improvement in speed.

arturv2000 · June 8, 2020, 4:36pm

The average gain was around 1.9x for all tests.

Curious that the only test that lost performance was Rectangle + opa.

Now let’s see what the STM32F469 can do, that is the main board i use for Display projects. I will post the benchmark results for comparison.

embeddedt · June 8, 2020, 4:38pm

I’m guessing that that’s because of measurement error. In any case, most people probably don’t care about the exact theoretical maximum as long as it’s above 45.

kisvegabor · June 10, 2020, 6:47am

@arturv2000 Thanks for sharing your results! That’s pretty fast.

In some cases, tearing can be avoided even without double buffering. If the rendering is very fast (10-15 ms in your case) and VSYNC period is slower (e.g. 20 ms) you can synchronize LVGL to start rendering right after VSYNC. This synchronization is not directly supported now but you can easily add it here. Just add a while(vsync_happened).