Remote Viewer and Control library for LVGL Applications

CubeCoders · October 22, 2024, 9:14pm

Remote control for LVGL apps over WiFi

I’ve built a library that allows you to view your LVGL based application (in my case, running on an ESP32S3) on your desktop - making it easier to test your application on real hardware while staying within your development environment. It also allows you to control touch inputs using your mouse. In effect a kind of ‘remote desktop’ for any LVGL application with minimal integration required.

There’s no actual requirement that a physical display be attached, so you can even test your LVGL projects interacting with hardware over GPIO for example even without a screen attached. You can literally just connect power to an ESP32 dev board and test out a real UI on it, remotely!

It uses RLE for compression and can achieve decent performance over WiFi. It’s also great for taking screenshots (or recording video with a capture app such as OBS) of your app on real-life hardware.

The library is available on Github: GitHub - CubeCoders/LVGLRemoteServer: A library for ESP32/Arduino/Similar to allow you to remote-control LVGL-based applications over WiFi. with a link to the client application to view/control your application.

kisvegabor · October 24, 2024, 7:57am

Wao, that’s fantastic!

I’d love to have something like this built-in to LVGL. Have you considered using a standard protocol, such as VNC?

FYI, I’ve just shared it on LinkedIn: Gábor Kiss-Vámosi on LinkedIn: Another great work from the LVGL community! ❤️

CubeCoders · October 24, 2024, 8:41am

I’m not sure the ESP32 is fast enough - the protocol my library uses is optimised for low CPU and memory usage (And the implementation is zero-allocation). Implementing it as a VNC server would absolutely be cool, but have much higher CPU and memory usage. The libraries for implementing VNC servers are prohibitive too, locking you into the GPL licence.

kisvegabor · October 24, 2024, 8:59am

Oh I see, thank you for all these info.

kdschlosser · October 24, 2024, 4:47pm

I am thinking that if you utilized double frame buffers and the frame buffers are in DMA memory transferring the frame buffer in a single shot without chunking it up at all or chunking it into the largest pieces that are allowed using DMA WiFi transfer would yield you the greatest performance. Don’t use any compression as this in and of itself is going to really slow things down. UDP is the way to go for transmitting the data which you are already doing.

in the flush callback I would send the bounding rect for the buffer and then send the buffer data. The receiving end can calculate the amount of data is is supposed to receive by using the bounding rect. The receiving end would need to know the size of the display that is being simulated, and the color format of the display. This way a display sized buffer can be created on the receiving end.

Having it be portable is easy to do. Use LVGL on the receiving end with the built in SDL driver. Receive the data and draw it to a canvas widget and let LVGL on the receiving end handle the rendering. You can get into the middle of the indev reads to send input to the micro controller pretty easily.

writing a python script to handle the cross platform compilation of the receiver is pretty simple to do. I am sure there are also C libraries that exist for cross platform UDP communications which would simplify things.

CubeCoders · October 24, 2024, 5:59pm

The current compression doesn’t slow things down enough to matter, RLE encoding is done as a single forward pass so it doesn’t take much time on a single updated region. I avoided using frame buffers to conserve memory, as it is it only needs buffers big enough for a single packet at a shot (which for a typical UI can encode quite a lot in one go) - the time spent doing the actual transfer is the biggest slowdown.

egonbeermat · October 24, 2024, 7:28pm

Excellent! I’ve been playing with this, as I already have a client that can remote control my app on an MCU via UDP, and additionally receive a stream of data from the MCU and display a scrolling chart, plus receive files/screenshots from the MCU, so it wasn’t much of a step to try to incorporate this. Still not quite there with my implementation (getting updates but not drawing them in the right places ) but…a couple of thoughts:

Perhaps send a control message before a flushed buffer update is to be sent to a client, containing the number of pixels / tiles to be received in the following chunked updates? I’m using Flutter on the client side and I’m not hugely familiar with their image options, but the method I’m using right now refreshes the image once I receive a chunk and that is a moderately expensive operation, and one probably better done once all the chunks have been transmitted for that particular buffer flush?

Secondly, sendRLEPacket method definition contains uint16_t progressStart as a parameter, but it is called internally with uint32_t datatypes (eg, runDataPosition) and, for larger screens, it probably should be uint32_t?

Excellent stuff!

egonbeermat · October 24, 2024, 9:06pm

Got it working, embedded remoteDisplay into my MCU app, and fudged the output of sendRLE into my existing remote control client. He’s a quick and dirty video:

Client is running on a Mac, but written in Flutter and can run on Windows, Android, iOS, Linux…connected via ethernet, screen size is 800*480, running LVGL 9.2.0 + recolor fix + ThorVG 0.15.1 + heap free fix on a Teensy 4.1 with 16Mb of flash RAM, used for heap. Partial frame buffer for LVGL and screen. FPS on the device is about 18-19fps in this setup, mostly due to all the serial debug streaming in the background - without the serial output, it’s 23fps and without the remoteDisplay, 25fps, and without the lottie animations, 68+

kdschlosser · October 24, 2024, 11:12pm

Question regarding using the run length encoding, how much does it actually save in terms of space when sending data? IDK if it would be really beneficial and it might actually cause more data to be sent if there is a lot going on in the UI. it’s basically an RGB value followed by the number of pixels to populate with that value. The pixels also need to be contiguous in the buffer. If there are a lot of color changes and a color doesn’t span for a large distance it could be adding a lot of extra data because of the numbers needing to be there to tell it how many pixels. The number is also going to need to be at least a uint16_t so each one is going to take up 2 bytes of space.

Is it enforced that you MUST use RLE or can the buffer be sent straight up?

egonbeermat · October 24, 2024, 11:43pm

The RLE count is uint16_t with overflow protection, and there is an alternative send method to send the buffer straight up, which I haven’t tried, yet. It will depend on the nature of the UI on whether RLE is beneficial, or not, but the algorythm is well understood, beneficial in many cases and, crucially for those of us using MCUs and not simulators on desktops with gobs of memory and horsepower, low impact.

CubeCoders · October 24, 2024, 11:57pm

progessStart should indeed be a 16-bit value - the RLE generation similarly caps the length of a single RLE run at 64K for this reason. Header is fixed at 10 bytes, 5x 16-bit values.

The number of RLE runs isn’t actually known ahead of time. It generates and sends them on-the-fly. The protocol layout and design is such that each packet is fully self-contained without reference to future or previous packets because over UDP you can’t guarantee that they arrive at all.

If you did theoretically give LVGL a really big buffer such that a single update area could exceed 64K pixels in length that would cause it a problem - but there are a few spare bits that can be scavenged to raise that if needed. For reference a 800x64 area (which is larger than you’d normally use) is 51K pixels.

CubeCoders · October 24, 2024, 11:58pm

There’s an uncompressed mode (send rather than sendRLE) but for a typical UI the performance difference is massive. The uncompressed mode uses 40x16 pixel chunks (because that’s the maximum it can fit in a single UDP packet) and it is significantly slower.

If you have a noisy image on the screen though rather than smooth, flat UI elements then it is faster than RLE.

I started working on a palettedRLE mode where it would count how many unique colours were within a given update area and if it was lower than a certain threshold (say 16) then it’d instead send a palette of N colours over, followed by indexed colours and RLE runs constrained to 255. A shorter maximum run length, but the data usage per run is cut in half so you get potentially twice as much data in a single packet (almost) in situations where there aren’t too many colours on screen at once.

egonbeermat · October 25, 2024, 12:08am

I ran into an issue with progressStart exceeding a 16 bit value for my buffer, hence I raised the issue, it’s not theoretical I am guessing that as you said the header is fixed at 10 bytes, that is a restriction elsewhere, rather than something you arbitrarily chose?

As for the info to send before each buffer flush and set of RLEs, I mentioned sending the number of pixels, not the number of RLEs, as the pixel count is calculated ahead of time, unlike the number of RLEs.

I did a quick and dirty modification of transmitInfoPacket() into transmitInfoPacket(uint16_t controlValue, uint32_t extraData), replaced the last 4 bytes of infoBuffer with the extraData value, changed existing calls to transmitInfoPacket(0xFFFF, 0) and added transmitInfoPacket(0x0002, totalPixels); in sendRLE for a quick workaround.

egonbeermat · October 25, 2024, 12:15am

For the RLE compression, here is the logs for the updates sent for my menu screen, which has 8 animating lotties in those first 8 circles (attempting 60fps) plus a couple of small areas on the top line that update each second:

As you can see, it’s sending between 0.17 and 1.07 bytes per 2 byte pixel in that sample set, so it’s effective and low cost for this type of UI, even though it’s only updating small areas around the label and lottie animations. For the full screen (second set of logs below), with all that contiguous background color, it is significant savings.

[ +10 ms] flutter: INFO: Global: Bytes: 2072, pixels: 3136, bytesPerPixel: 0.6607142857142857
[ ] flutter: INFO: Global: Bytes: 2404, pixels: 3721, bytesPerPixel: 0.6460628863208815
[ ] flutter: INFO: Global: Bytes: 2588, pixels: 6561, bytesPerPixel: 0.39445206523395826
[ ] flutter: INFO: Global: Bytes: 308, pixels: 5041, bytesPerPixel: 0.061098988295973024
[ +58 ms] flutter: INFO: Global: Bytes: 2276, pixels: 6561, bytesPerPixel: 0.346898338667886
[ +2 ms] flutter: INFO: Global: Bytes: 2352, pixels: 3721, bytesPerPixel: 0.6320881483472185
[ +1 ms] flutter: INFO: Global: Bytes: 1468, pixels: 3136, bytesPerPixel: 0.46811224489795916
[ +2 ms] flutter: INFO: Global: Bytes: 2996, pixels: 3136, bytesPerPixel: 0.9553571428571429
[ +2 ms] flutter: INFO: Global: Bytes: 2248, pixels: 3136, bytesPerPixel: 0.7168367346938775
[ +2 ms] flutter: INFO: Global: Bytes: 2704, pixels: 3721, bytesPerPixel: 0.7266863746304757
[ +2 ms] flutter: INFO: Global: Bytes: 2516, pixels: 6561, bytesPerPixel: 0.3834781283340954
[ +8 ms] flutter: INFO: Global: Bytes: 732, pixels: 5041, bytesPerPixel: 0.14520928387224757
[ +1 ms] flutter: INFO: Global: Bytes: 1932, pixels: 2160, bytesPerPixel: 0.8944444444444445
[ +36 ms] flutter: INFO: Global: Bytes: 2288, pixels: 6561, bytesPerPixel: 0.34872732815119645
[ +1 ms] flutter: INFO: Global: Bytes: 2356, pixels: 3721, bytesPerPixel: 0.6331631281913465
[ +2 ms] flutter: INFO: Global: Bytes: 1444, pixels: 3136, bytesPerPixel: 0.4604591836734694
[ +6 ms] flutter: INFO: Global: Bytes: 3376, pixels: 3136, bytesPerPixel: 1.0765306122448979
[ ] flutter: INFO: Global: Bytes: 2520, pixels: 3136, bytesPerPixel: 0.8035714285714286
[ ] flutter: INFO: Global: Bytes: 3024, pixels: 3721, bytesPerPixel: 0.8126847621607095
[ +2 ms] flutter: INFO: Global: Bytes: 2516, pixels: 6561, bytesPerPixel: 0.3834781283340954
[ +2 ms] flutter: INFO: Global: Bytes: 916, pixels: 5041, bytesPerPixel: 0.18170997817893275
[ +40 ms] flutter: INFO: Global: Bytes: 2324, pixels: 6561, bytesPerPixel: 0.35421429660112785
[ +2 ms] flutter: INFO: Global: Bytes: 2460, pixels: 3721, bytesPerPixel: 0.6611126041386725
[ +2 ms] flutter: INFO: Global: Bytes: 1448, pixels: 3136, bytesPerPixel: 0.461734693877551
[ +1 ms] flutter: INFO: Global: Bytes: 3512, pixels: 3136, bytesPerPixel: 1.1198979591836735
[ +2 ms] flutter: INFO: Global: Bytes: 2604, pixels: 3136, bytesPerPixel: 0.8303571428571429
[ +2 ms] flutter: INFO: Global: Bytes: 3188, pixels: 3721, bytesPerPixel: 0.8567589357699543
[ +3 ms] flutter: INFO: Global: Bytes: 2480, pixels: 6561, bytesPerPixel: 0.377991159884164
[ +8 ms] flutter: INFO: Global: Bytes: 864, pixels: 5041, bytesPerPixel: 0.1713945645705217

– Full screen (800 * 480):

[ +30 ms] flutter: INFO: Global: Bytes: 57208, pixels: 148800, bytesPerPixel: 0.38446236559139785
[ +31 ms] flutter: INFO: Global: Bytes: 51012, pixels: 148800, bytesPerPixel: 0.3428225806451613
[ +17 ms] flutter: INFO: Global: Bytes: 34352, pixels: 86400, bytesPerPixel: 0.3975925925925926

kdschlosser · October 25, 2024, 3:11am

The reason I am bringing up the RLE encoding is because of the inability to use DMA for the frame buffers due to the need to iterate over the buffer. Where you would see a real boost in performance is when using double buffering and DMA memory without using RLE. You would hand off the buffer to the WiFi and the buffer would get sent without using any processor time. This allows the application to continue running because the transmit is non blocking. So while the one buffer is being sent LVGL could be filling the second buffer. This is a sizeable boost in performance.

CubeCoders · October 25, 2024, 6:27am

What I actually want to do is make the progress length 24 bit by encoding the top 8 bits into the first 4 bits of the X and Y positions since they definitely don’t need to go up to 64k xD That allows for an update region of 16M pixels or up to 4096x4096.

Ivan_Tarozzi · October 25, 2024, 1:03pm

WOW! It seems a very interesting and usefull project.

Is the code of the client application available also?

CubeCoders · October 25, 2024, 4:42pm

It will be once a few other changes have been finalised.

egonbeermat · October 26, 2024, 2:46am

Have the remote indev working now, needs some refinement to translate the drag on the desktop smoothly vs what is sent to the remote. Right now, I just put a small delay in transmitting the drag co-ordinates, and it’s a little smoother. Added additional client commands to disable on-device screen updates when streaming, for a smoother experience if, say, screen recording, and experimented with tiles vs. RLE and in almost every case, the RLE compresses to < 2 bytes per pixel on average, even for a near full screen of video and some photos with graduated skies.

And also, why remote control and stream 1 device when you can do 2?! Having fun here.

https://youtube.com/shorts/QV8iFE8fgsc

kdschlosser · October 26, 2024, 3:20am

if it’s at 2 bytes per pixel or above it has no benefit. Using RLE causes a loss of the ability to use DMA memory and double buffering and that in and of itself would cause a performance decrease that easily translates into needing to send twice as much data.

How to test this is to measure the amount of time it takes the flush function from the time it is called until the time it returns. That chunk of time is the time consumed in order to compress and transmit the data. With DMA memory and double buffering that time would be only a few nanoseconds. Where as it is going to take milliseconds using the RLE. Not only do you have the speed advantage there bit because it is a non blocking call that is made to transmit the data the second buffer is able to be filled while the data is being sent. This is not something that is able to be done using RLE.

The issue is that DMA memory transfers are not supported using the Arduino IDE which is what is used for this software. In order to get DMA to work is pretty complicated. The ESP-IDF that is used in the Arduino IDE is not the full SDK. It is only a small subset of it. There is a lot that has been removed from the IDF. It would have been nice of the whole thing was ported to work on the Arduino IDE. The other thing is that the Arduino version of the IDF is going to be slower because of wrapping a lot if the IDF in order to make it so it will work in the Arduino IDE.