Adding esp_http_server.h to the generator

Micropython upstream now supports esp-idf v4.2, with Cmake.
It still requires some integration effort for lv_micropython, but that’s definitely an important step forward.

That’s great. I have tested the httpd from that version with MP and with a somewhat hackish workaround its integrated websocket can also be used. I plan to use this for the “remote desktop” feature.

I just started the first bulk data transfer test via a websocket from MP through httpd. And during the mars landing I forgot about it. I just stopped it after having transferred a total of 2.03 GB from ESP to the browser … so websockets transfer seem to be pretty reliable :slight_smile: Perfect for remote display. The next days I’ll implement a demo for a remote display via websocket and also extend my previous work of sending mouse events back from the browser to the MP target. Pretty exciting that these things start to work that well. I’ll update my patches soon, so the progress so far doesn’t get lost.

And I’ll need some native data compression … there seem to be several options for that incl. compression added to uzlib …

1 Like

And here’s another patch … slightly simplified as function calls from MP to httpd work directly since I am avoiding the scheduler in LVGL. This also adds websocket support if it’s present and enabled in esp-idf.

The websocket support works well enough for the remote display and I use it in another project to send (uncompressed) video as well as mouse events. The resulting video at ~2Hz is actually usable for simple remote operations.

http_server.patch.txt (10.9 KB)
http_server_bare.py.txt (8.4 KB)

Here’s a little Youtube Video demonstrating the remote view.

Any ideas for compression? There is a pull request for gzip compression support in MP at github. I gave it a try, but it seems to be unable to decompress what it just compressed itself.

1 Like

Very nice! Thanks for the video. Great achievement!

It really seems unresponsive when remote-viewing, doens’t it?
Perhaps you could send the data without blocking the GUI, such that the device would feel more responsive and smooth (with lower FPS on the remote).

Are you sending the entire display buffer, or only the changed data?
With a little effort It might be possible to keep a shadow copy of the display buffer, compare it to the display buffer and only send the differences. If it’s done in C and the shadow is kept on RAM (and not PSRAM) it should be pretty fast and save a lot of network traffic.
I’m not sure you would have enough RAM on ESP32 to do that, though. So a more creative idea could be to keep a much smaller shadow buffer that keeps only checksums of segments of the display buffer, and send a segment when a checksum does not match.

lodepng library, which already has Micropython bindings, can do both encoding and decoding of PNG images. I currently disabled the encoding but you can easily remove this define and both encode and decode PNG from Micropython.
I suspect that it might be too slow since the data is allocated on PSRAM (if you have PSRAM), but you can try changing the allocators to use RAM. Even so, it might be too slow, I’m not sure. I’m also not sure if you can do that with only ESP32 RAM without PSRAM.

Sure. When video is running MP sends 156kBytes per frame … blocking … That’s why I think a buffer may help which is just filled with the entire image from python side and the (blocking) transmission itself takes place on httpd side on the second core.

This is always sending full screen shots. The idea was that LVGL runs unhindered between two of those screenshots. Tracking and sending the change events would require me to send all of them and I must not miss a single one. Still worth a try and I am actually prepared for this since technically I am aldready sending exactly those change events to the browser. Currently I just send the four change events that happen during a full screen update.

I don’t think a can avoid to put the full screen buffer in PSRAM. I think I’ll try that next to see the performance penalty of maintaining a PSRAM shadow buffer of the screen. And if I do that I’d only use this buffer to exchange video data between MP and HTTPD. And then I can do compression on httpd side … I think it may make sense to do all the heavy http related stuff on the second core.

I don’t think you can do any of this without PSRAM. Even bare LVGL+MP is barely usable without PSRAM.

If you must use PSRAM, make sure you configure it to 80mhz QIO (default is 40mhz DIO) if your board supports that.

Anyway, I believe there is a way to avoid PSRAM.
LVGL works with display buffers that can be much smaller than screen size (see factor argument to ili9xxx driver).
You could send only these smaller buffers and keep everything on RAM not on PSRAM.
Another optimization is to track checksums of these smaller buffers and send them only when they are different than previous ones.

There’s a new Youtube video of this setup now including uzlib.gzip compression support. This additional layer of complexity of course comes with its own pitfalls and e.g. the compression pull request for uzlib.gzip had a memory leak and there’s still another problem that triggers and assertion in gc_free every now and then.

But it’s making progress and there’s still room for improvement as I can still e.g. move the blocking send and the compression itself from MP to the httpd side running on the second core.

So back to my stupid questions. In the uzlib.gzip MP wrapper I am seeing stuff like this:

byte *dest_buf = m_new(byte, dest_buf_size);
...
mp_obj_t res = mp_obj_new_bytearray_by_ref(dest_buf_size, dest_buf);

This IMO requires two malloc’s on the MP’s garbage collected heap. But what happens if the second malloc behind mp_obj_new_bytearray_by_ref() triggers a garbage collection? What prevents dest_buf to be deleted by GC? There IMHO is no reference to dest_buf to be found by the GC.

I think it’s something like that I am seeing in the uzlib.gzip after some runtime. Actually that function not only allocates these two but does a few more nested mallocs.

I suppose you keep res somewhere visible to gc.

When gc “collects”, it first marks all reachable heap blocks.
Since res is known to gc, it first scans res for pointers into the gc heap and finds dest_buf there (because dest_buf is saved into “items” member of mp_obj_array_t, which is returned as res).

So I believe that as long as you store res into gc heap somehow (or stack etc.) then gc would not collect dest_buf before res is collected.

That happens way later. No, I mean while still inside that c function, when I call some malloc and that malloc runs into a low memory situation. Wouldn’t that trigger the gc? And could that potentially delete the first object as is pointer is not yet stored anywhere?

I think GC checks the stack as well.

Where do i do that? I have tried adding

CONFIG_SPIRAM_SPEED_20M=
CONFIG_SPIRAM_SPEED_26M=
CONFIG_SPIRAM_SPEED_40M=
CONFIG_SPIRAM_SPEED_80M=y

to ports/esp32/boards/sdkconfig.spiram. But that does not affect the setting as they end up in sdkconfig.h. It’s still 40M there.

@embeddedt is correct, gc scans the stack as well and prevents collections of blocks that are pointed to from the stack.

There are some configuration parameters that affect the bootloader and others that affect the main firmware.

For the bootloader you can try setting FLASH_MODE and FLASH_FREQ on esp32/Makefile.

For the main firmware you can try setting on the boards/sdkconfig that applies to your board:

CONFIG_ESPTOOLPY_FLASHMODE_QIO=y
CONFIG_ESPTOOLPY_FLASHMODE="qio"

CONFIG_ESPTOOLPY_FLASHFREQ_80M=y
CONFIG_ESPTOOLPY_FLASHFREQ="80m"

CONFIG_FLASHMODE_QIO=y
CONFIG_SPIRAM_SPEED_80M=y
CONFIG_ESP32_REV_MIN_1=y

If you are looking at the messages printed to serial after boot your would see the boot SPI configuration, but that might change later. There should be some way of checking this through esp idf API.

Ahhh … yes, things are faster now. Thanks!

There’s now a patch for the websocket context problem available on github. My ugly workaround is thus not needed anymore.

The patch is now officially included in the ESP-IDF.

I have given up on maintaining the reduced LVGL/Blocky setup on github. Instead there’s now the full ftDuino32 project at https://github.com/harbaum/ftDuino32

This includes everything incl all patches required for MP as well as ESP-IDF, Blockly, Codemirror, the esp_http_server and the WebSocket based live view.

I have recently moved the video transmission from the python side back to the httpd running on the second core. That greatly reduces the load on the python side and even with the live view running the python side is only barely slowed down.

Relying on patches is a bit fragile don’t you think?
Are the users supposed to know exactly which espidf/lvgl/micropython version to patch? Wouldn’t it break if they take a slightly different version?

If you want to make users lives much easier, you can consider adding a Makefile that collects the correct versions of all components (espidf/lvgl/micropython), builds them, deploys them, copies python and html files to flash etc.

Instead of patches, you can consider adding espidf/lvgl/micropython as git submodules. You can add your own forks with the changes you need such that its easy for users to clone them instead of applying patches, and possibly merge newer versions when needed.

This is only during development. Some of this will hopefully make it into the repositories. And one day I’ll sure also release some ready-to-run binary images.

I didn’t plan to release it this early but some other developer might find this interesting any useful even with those patches.

1 Like