Adding esp_http_server.h to the generator

The malloc is definitely related to this. This runs forever:

esp_err_t http_server_internal_handler(httpd_req_t *req) {
  printf("req\n");  
  httpd_resp_sendstr(req, "<h1>Micropython test</h1>");
  return 0;
}

This crashes after some time or triggers those weird python error messages:

esp_err_t http_server_internal_handler(httpd_req_t *req) {
  printf("req\n");
  httpd_resp_sendstr(req, "<h1>Micropython test</h1>");

  void *p = NEW_PTR_OBJ(httpd_req_t, req);
  printf("p = %p\n", p);
  return 0;
}

Nothing else involved, no scheduler, no semaphores, no python callback … just the pure object creation …

Sure, but under the hoods the Python object you are passing is also created with m_new_obj so I don’t see how that would help.
On the C side a Python object is represented by mp_obj_t.

Maybe the object is garbage collected?
In such case its memory is allocated to another object. So when you write to it you overwrite some other Python object.
To prevent this, a reference to your object must be preserved somewhere on the Python side.

You can verify whether gc is related by trying to disable garbage collection by gc.disable(), and see if the problem still happens when gc is disabled.

Maybe. That would not hurt. I am not touching the object ever again. I am just creating it and then forget about it.

I am testing a very ugly solution which so far looks pretty good … I am not generating a new object for every callback. Instead I create it once to re-use it. So I keep a reference and replace the embedded pointer on every callback invocation,

But now I likely have a gc problem since I’ll never know when gc will destroy this object … I can of course keep a reference on python side but that will look confusing as I keep a reference to this object for no apparent reason.

You can keep it as another member of handler_data_t next to user_data.
The user won’t care about your object in the same way he doesn’t care about user_data, but gc will not collect it as long as the user holds handler_data_t.

Still… I wonder what we are missing here that is causing this issue.

I am doing exactly that. But how should gc know that this void pointer actually points to one of the objects its about to delete? MP IMO does not know anything about pointers I store inside handler_data_t.

It knows.
The gc scans all memories it allocated (handler_data_t included) looking for pointers to other memories it allocated, and marks them (it’s a ā€œmark and sweepā€ gc).

So if I store random data which coincidentally equals the pointer to some object then this object will not be deleted?

Correct.
But the chances for that are low and the consequences are mild (some memory would not be freed).
If anything, the disadvantage is performance. Every time gc is collected, all allocated RAM is being read actually.

But this actually makes my ā€œdirtyā€ solution to be at least ā€œokā€. Yes, you are right, that we should understand what’s the problem with this object creation as I am still doing this once and it may just be the case that this is still doing harm and still overwrites the wrong memory area. The problem may just have become less obvious but it may still be there.

Anyway, things start to become usable and the httpd performs pretty good even when lvgl is under load and when each httpd request requires a callback into python.

I still think I would like to add the ability to serve files without any callback into python. But in order to do that I would have to access vfs from httpd …

Here’s another patch. This time the object is created in the python task. This runs very stable and quite fast.

http_server.patch.txt (11.2 KB)

Just tried and succeeded to compile the patched lv_micropython. Server is running. I can start playing with it.
Thanks!

With LVGL in the bg it still crashes quite fast. Even with a minimal single label screen without touch driver.

What happens is quite interesting: The args pointer given to the scheduler doesn’t arrive in the handler. Instead ā€œ6ā€ arrives which imho is MP’s represenation for ā€œNoneā€. This happens with gc disabled.

mp_sched_schedule(0x3f458b50,0x3f81bc00)
http_server_handler_cb(0x3f81bc00)
...
mp_sched_schedule(0x3f458b50,0x3f81bc00)
http_server_handler_cb(0x3f81bc00)
...
mp_sched_schedule(0x3f458b50,0x3f81bc00)
http_server_handler_cb(0x3f81bc00)
...
mp_sched_schedule(0x3f458b50,0x3f81bc00)
http_server_handler_cb(0x3f81bc00)
...
mp_sched_schedule(0x3f458b50,0x3f81bc00)
http_server_handler_cb(0x3f81bc00)
...
mp_sched_schedule(0x3f458b50,0x3f81bc00)
http_server_handler_cb(0x6)

… and reboot as dereferencing 6 isn’t a good idea. Now I need to figure out where this can get lost.

Edit: This is not a permanent thing. If I allow the handler to return if arg is wrong then the subsequent calls are often fine again. So there’s nothing permanently messed up.

Using uasync for the lvgl handling doesn’t change anything (assuming I did it correctly). Attached are my two simple http_servers, each serving a single simple page and running a small scrolling label inlvgl. One classic style using the scheduler and one using uasync.

I’ve checked that MP_STATE_VM(sched_queue) is consistent in the good and the failing scheduler invocations. It is …

http_server_lvgl.py.txt (1.3 KB) http_server_lvgl_async.py.txt (1.4 KB)

Do you have an option to connect a debugger through JTAG?

No, I don’t. The ESP32 modules don’t expose the JTAG pins, do they?

Anyway, I was wrong about the uasync test. The lvesp32.deinit() also needs to be put after the display has been initialized. Otherwise it starts to call the scheduler, again.

And guess what? Now that the scheduler is not used by lvgl anymore, the httpd runs somewhat stable. I still think we should understand why using the scheduler the normal way leads to this problem.

Here’s a server that works with lvgl:
http_server_lvgl.py.txt (8.9 KB)

Actually they do!

ESP32 PORT  FT232H PORT  COLOR
==========  ==========   ======
GPIO13      AD0 (TCK)    Purple
GPIO12      AD1 (TDI)    Blue
GPIO15      AD2 (TDO)    Green
GPIO14      AD3 (TMS)    Yellow
GND         GND          Black

I agree, and I think that a debugger could be helpful for that.

I just ordered a ft232h adapter. These GPIOs are being used on my custom board but I should easily do a breadboard setup that exposes the same issues.

My current suspicion is that the MP scheduler is not multicore safe. What I do see in these problematic situations is that function pointers and arg pointers do get messed up … as if the scheduler queue is written while the scheduler runs entries from it. There are ā€œatomicā€ macros which are supposed to handle that.

Under the hoods it uses a mutex exposed by ESP32-FreeRTOS, which is multicore safe, but maybe there’s some MP code which should be protected and is not.

Funny side note: I think I am tracing a bug in the esp-idf which is not passing the user_ctx to the handler if websocket is being used …