Adding esp_http_server.h to the generator

Till_Harbaum · February 3, 2021, 7:45pm

I think buffering may still be very useful. Currently the single threaded python seems to throttle the multithreaded httpd. Using buffers may help relaxing this as python would not be blocked during the long transmission time.

Till_Harbaum · February 4, 2021, 4:56pm

httpd websocket support was added to espidf in march 2020 … grumble …

Any plans to update espidf?

embeddedt · February 4, 2021, 5:14pm

I believe this is dependent on upstream, as our fork of MicroPython mostly adds LVGL-related changes.

Apparently someone opened a PR to update to ESP-IDF 4.1 a few months ago, but there hasn’t been any movement on it: https://github.com/micropython/micropython/pull/6413

If you want ESP-IDF updated quickly, you may be on your own, as upstream has historically released updates at a very sparse rate. It sounds like they are planning to do some updates to the ESP32 port in the next release, but that’s not scheduled for release till April.

Till_Harbaum · February 4, 2021, 8:19pm

Maybe it’s possible to just include the updated httpd for a start … although it’s a super ugly solution. But before doing anything like that I need to prepare something clean enough to release.

Till_Harbaum · February 7, 2021, 2:29pm

So the basic functionality (get and post) of the http_server is working now and I don’t need the espidf patched anymore. So things start to look nicer now and I was courious how the httpd performs with high lvgl graphics load and how the “core_id” parameter in the httpd config influences this … but …

The httpd works fine under negligible load on lvgl side. But once i switch to the “Chart” demo page the httpd stops working. What stops is related to the scheduler. httpd internal error messages still work flawlessly and debug info tells me that mp_sched_schedule is being called even under high load. But in that case the scheduled function is never called.

It’s my understanding that lvgl processing itself uses the same mechanism and is obviously still working as the chart graphic is still animated nicely. But my scheduled function is never called, Reducing load doesn’t make the previously schedules function being called. It’s lost. May lvgl32’s attempts to call the scheduler itself somehow overwrite my own attempts? But why then only under high load …

Till_Harbaum · February 7, 2021, 3:34pm

Furher investigation shows that the call to mp_sched_schedule() return false which in turn means that the scheduler is full.

Sounds like some job is filling the scheduler queue. May this be the screen update? If yes, would it make sense to limit that so at most one of these jobs is pending? It seems these get frequently lost, anyway.

Till_Harbaum · February 7, 2021, 3:42pm

Successfully tried this in lvesp32 as a quick hack and this indeed makes my callback work and doesn’t have a visible negative effect on the Chart demo. Maybe something similar would make sense.

static bool schedule_in_progress = 0;

STATIC mp_obj_t mp_lv_task_handler(mp_obj_t arg)
{
    lv_task_handler();
    schedule_in_progress = 0;
    return mp_const_none;
}

STATIC MP_DEFINE_CONST_FUN_OBJ_1(mp_lv_task_handler_obj, mp_lv_task_handler);

static void vTimerCallback(TimerHandle_t pxTimer)
{
    lv_tick_inc(portTICK_RATE_MS);
    if(schedule_in_progress) return;
    
    schedule_in_progress = 1;
    mp_sched_schedule((mp_obj_t)&mp_lv_task_handler_obj, mp_const_none);
}

Edit … ok, this is not perfect and everything blocks after some time. But it’s at least the right direction

embeddedt · February 7, 2021, 4:42pm

IMO it would be cleanest for the next lv_task_handler call to be scheduled inside mp_lv_task_handler, though this would result in all the idle CPU time being taken by lv_task_handler. Right now it gets scheduled at a fixed rate which results in missed calls like you’ve noted.

Till_Harbaum · February 7, 2021, 4:52pm

Maybe. But I still wonder why my approach hangs after a while. I changed from the bool to a proper binary semaphore and also catched the case where mp_sched_schedule fails (which actually doesn’t happen). Still everything locks up after a while. It feels like there are cases where lv_task_handler() never gets called or never returns … if that’s the case then your approach would also hang.

Edit: A little more debugging shows that there seem to be rare cases where something has successfully been scheduled (mp_sched_schedule returned true) but still the scheduled function is never being called …

Edit^2: In that locked case mp_sched_num_pending() constantly returns 1. So the function definitely is pending. But the scheduler never tries to run it.

Till_Harbaum · February 7, 2021, 6:50pm

The following solution works. But it’s rather ugly and may still cause trouble if a third job is being scheduled. Then there’s once more not enough room in the queue.

static void vTimerCallback(TimerHandle_t pxTimer)
{
    lv_tick_inc(portTICK_RATE_MS);
    // never try to use the last free seat ...
    if(mp_sched_num_pending() >= MICROPY_SCHEDULER_DEPTH-1)
      return;
    
    mp_sched_schedule((mp_obj_t)&mp_lv_task_handler_obj, mp_const_none);
}

embeddedt · February 7, 2021, 6:53pm

Is there, by chance, a way to check whether &mp_lv_task_handler_obj is pending? Then you could just not schedule a new one till the previous one is no longer pending.

Till_Harbaum · February 7, 2021, 7:04pm

I don’t know. But I assume that this would still expose my problem as I see that there is a job waiting in the scheduler when I get into the locked state. I am pretty sure it’s the lvgl job being stuck. The question is: why doesn’t the scheduler run it and why does scheduling another job cause the stuck one to be run as well?

amirgon · February 7, 2021, 10:25pm

Maybe this happens when an exception is thrown.
This can happen if lv_task_handler calls some callback which raises an exception without catching it. In such case I believe lv_task_handler won’t return.
A general question is whether LVGL always keeps its state consistent in cases of callback functions that don’t always return. This could also happen on other bindings such as C++.

Another option is that the Micropython thread being blocked, but I doubt this is the case because I would expect the scheduler to be blocked and apparently it’s not.
Are you using the _thread module? Maybe one thread is blocked and others still run?

I’m not sure catching Python exceptions on C code is the best idea.
But catching exceptions in Python is straightforward, so here are some ideas:

Try this with lv_async. Since ILI9341 imports lvesp32 (unfortunately), you need to call lvesp32.deinit() after initializing the display and only then call lv_async(). In current project I’m using uasyncio with this technique.
The advantage is that you don’t need to rely on Micropython scheduler.
I think that ili9xxx should not import lvesp32, but changing that now would break backward compatibility for anyone assuming ili9xxx imports lvesp32, so maybe it’s better to do this change only on the next major release.
Replace lvesp32 by Python implementation that uses a timer, as done with stm32.
It might be possible to call lv_task_handler directly, although I’m not sure. The docs, at least, warn that the callback might be called in interrupt context so we would still need to call schedule in order to call lv_task_handler. On the other hand, this scheduling is (apparently) not needed for stm32 so maybe we can get away with that on esp32 as well.
Replace lvesp32 by Python implementation that uses a FreeRTOS timer, like done in lvesp32.
That would require exposing xTimerCreate on espidf with callback conventions etc.
Another thought - if the Micropython’s thread (FreeRTOS task) priority is higher than the httpd priority, Micropython could block httpd indefinitely since FreeRTOS uses strict priority.
If you don’t want to change thread priorities, a simple thread wait (esp.task_delay_ms) could give the lower priority thread an opportunity to run.
This problem reminds me the issues we had with lvesp32 + bluetooth. Increasing the timer period there seemed to help to some extent.

embeddedt · February 7, 2021, 11:17pm

It does not. The expectation is that control will always flow back through the call chain till lv_task_handler returns, since this is how C works (unless there’s a crash, obviously).

That is a problem now that you mention it. It means that throwing exceptions out of an event handler without catching them will lead to a hang. Is there a way for the binding to detect this?

amirgon · February 7, 2021, 11:32pm

It’s possible of course, at a price.
It would mean wrapping every callback with exception handling code that costs both program memory and cycles.
When doing that, it’s not clear what would be the callback return value in case of exception.

Other options are:

By convention, require anyone writing a callback to catch exceptions.
Change LVGL assumptions regarding callbacks.

This problem is not limited to Micropython, it’s relevant to any binding that can throw exceptions, such as C++.

embeddedt · February 8, 2021, 1:08am

This is more easily solvable: we could adopt a convention of having callbacks return 0, NULL, or nothing (depending on their normal return type) as a default or error state.

Wouldn’t this be the same cost as handling it within the binding itself?

Nevertheless, I think this is the best option, as I don’t see an easy way to make LVGL handle this case. The assumption in a standard C program is that control passes in and out of the function at some point. I think preventing that from being an issue would significantly complicate LVGL’s event loop.

Till_Harbaum · February 8, 2021, 7:52am

I have this problem with the advanced demo and the only change over the official MP lvgl version is the attempt in modlvesp32 to keep the scheduler from overflowing. I will redo the entire setup with a fresh download. But imho there are no exceptions or the like involved.

amirgon · February 8, 2021, 8:01am

I still think it’s worth trying with uasyncio and lv_async , where lv_task_handler can be called directly without scheduling.

Till_Harbaum · February 8, 2021, 1:20pm

I just restarted with an antirely fresh setup. To find this lockup problem but also to make sure that my current http version runs with an unpatched espidf. And guess what? No locks so far … dunno what I did previously.

The following is from the modlvesp32 I am now using. IMO it really makes sense doing it that way. With the previous version I’d expect other schedule attempts to also fail. The same schedule mechanism is used for interrupt handling, right? You should then see lots of lost interrupts in graphics high load situation. I really think this should be fixed.

static SemaphoreHandle_t schedule_in_progress;

STATIC mp_obj_t mp_lv_task_handler(mp_obj_t arg)
{
    lv_task_handler();
    xSemaphoreGive(schedule_in_progress);
    return mp_const_none;
}

STATIC MP_DEFINE_CONST_FUN_OBJ_1(mp_lv_task_handler_obj, mp_lv_task_handler);

static void vTimerCallback(TimerHandle_t pxTimer)
{
    lv_tick_inc(portTICK_RATE_MS);

    if(!xSemaphoreTake(schedule_in_progress, 0)) 
      return;

    if(!mp_sched_schedule((mp_obj_t)&mp_lv_task_handler_obj, mp_const_none))
      xSemaphoreGive(schedule_in_progress);
}

STATIC mp_obj_t mp_init_lvesp32()
{
    if (xTimer) return mp_const_none;

    lv_init();

    // create binary semaphore to make sure only one callback is being
    // scheduled at a time
    schedule_in_progress = xSemaphoreCreateBinary();
    xSemaphoreGive(schedule_in_progress);

    xTimer = xTimerCreate(
                "lvgl_timer",
                1,              // The timer period in ticks.
                pdTRUE,         // The timers will auto-reload themselves when they expire.
                NULL,           // User data passed to callback
                vTimerCallback  // Callback function
            );

    if (xTimer == NULL || xTimerStart( xTimer, 0 ) != pdPASS){
        ESP_LOGE(TAG, "Failed creating or starting LVGL timer!");
    } 

   return mp_const_none;
}

amirgon · February 8, 2021, 2:24pm

In general I agree that modlvesp32 should be fixed such that the scheduler queue is not overflown, but I’m not sure a blocking semaphore here is a good idea.

We should expect that in certain situations the previous call to lv_task_handler might not complete before it’s time to schedule the next call.
This can happen with high FPS and heavy rendering, but also in case the user callback is taking too long.
In such occasions it’s fine to skip the next call to lv_task_handler, and possibly lose a frame or two, but we should not skip the call to lv_tick_inc.

The problems I see with your suggestion are:

You are blocking both lv_task_handler and lv_tick_inc
You are blocking a FreeRTOS timer, which is bad because it affects other unrelated timers and FreeRTOS command queue in general.

A different approach could be to use a counter and simply skip calls to lv_task_handler if the previous hasn’t finished yet (or keep one or two calls “in flight”).
The problem with that approach is that it breaks down once lv_task_handler is allowed not to return, due to exception that is thrown on a callback as discussed above.
In such case it might be worth catching Micropython exceptions in C and decrease the counter before propagating them further (kind of a “finally” block in C).