Adding esp_http_server.h to the generator

IMO it would be cleanest for the next lv_task_handler call to be scheduled inside mp_lv_task_handler, though this would result in all the idle CPU time being taken by lv_task_handler. Right now it gets scheduled at a fixed rate which results in missed calls like you’ve noted.

Maybe. But I still wonder why my approach hangs after a while. I changed from the bool to a proper binary semaphore and also catched the case where mp_sched_schedule fails (which actually doesn’t happen). Still everything locks up after a while. It feels like there are cases where lv_task_handler() never gets called or never returns … if that’s the case then your approach would also hang.

Edit: A little more debugging shows that there seem to be rare cases where something has successfully been scheduled (mp_sched_schedule returned true) but still the scheduled function is never being called …

Edit^2: In that locked case mp_sched_num_pending() constantly returns 1. So the function definitely is pending. But the scheduler never tries to run it.

The following solution works. But it’s rather ugly and may still cause trouble if a third job is being scheduled. Then there’s once more not enough room in the queue.

static void vTimerCallback(TimerHandle_t pxTimer)
{
    lv_tick_inc(portTICK_RATE_MS);
    // never try to use the last free seat ...
    if(mp_sched_num_pending() >= MICROPY_SCHEDULER_DEPTH-1)
      return;
    
    mp_sched_schedule((mp_obj_t)&mp_lv_task_handler_obj, mp_const_none);
}

Is there, by chance, a way to check whether &mp_lv_task_handler_obj is pending? Then you could just not schedule a new one till the previous one is no longer pending.

I don’t know. But I assume that this would still expose my problem as I see that there is a job waiting in the scheduler when I get into the locked state. I am pretty sure it’s the lvgl job being stuck. The question is: why doesn’t the scheduler run it and why does scheduling another job cause the stuck one to be run as well?

Maybe this happens when an exception is thrown.
This can happen if lv_task_handler calls some callback which raises an exception without catching it. In such case I believe lv_task_handler won’t return.
A general question is whether LVGL always keeps its state consistent in cases of callback functions that don’t always return. This could also happen on other bindings such as C++.

Another option is that the Micropython thread being blocked, but I doubt this is the case because I would expect the scheduler to be blocked and apparently it’s not.
Are you using the _thread module? Maybe one thread is blocked and others still run?

I’m not sure catching Python exceptions on C code is the best idea.
But catching exceptions in Python is straightforward, so here are some ideas:

  • Try this with lv_async. Since ILI9341 imports lvesp32 (unfortunately), you need to call lvesp32.deinit() after initializing the display and only then call lv_async(). In current project I’m using uasyncio with this technique.
    The advantage is that you don’t need to rely on Micropython scheduler.
    I think that ili9xxx should not import lvesp32, but changing that now would break backward compatibility for anyone assuming ili9xxx imports lvesp32, so maybe it’s better to do this change only on the next major release.
  • Replace lvesp32 by Python implementation that uses a timer, as done with stm32.
    It might be possible to call lv_task_handler directly, although I’m not sure. The docs, at least, warn that the callback might be called in interrupt context so we would still need to call schedule in order to call lv_task_handler. On the other hand, this scheduling is (apparently) not needed for stm32 so maybe we can get away with that on esp32 as well.
  • Replace lvesp32 by Python implementation that uses a FreeRTOS timer, like done in lvesp32.
    That would require exposing xTimerCreate on espidf with callback conventions etc.
  • Another thought - if the Micropython’s thread (FreeRTOS task) priority is higher than the httpd priority, Micropython could block httpd indefinitely since FreeRTOS uses strict priority.
    If you don’t want to change thread priorities, a simple thread wait (esp.task_delay_ms) could give the lower priority thread an opportunity to run.
  • This problem reminds me the issues we had with lvesp32 + bluetooth. Increasing the timer period there seemed to help to some extent.

It does not. The expectation is that control will always flow back through the call chain till lv_task_handler returns, since this is how C works (unless there’s a crash, obviously).

That is a problem now that you mention it. It means that throwing exceptions out of an event handler without catching them will lead to a hang. Is there a way for the binding to detect this?

It’s possible of course, at a price.
It would mean wrapping every callback with exception handling code that costs both program memory and cycles.
When doing that, it’s not clear what would be the callback return value in case of exception.

Other options are:

  • By convention, require anyone writing a callback to catch exceptions.
  • Change LVGL assumptions regarding callbacks.

This problem is not limited to Micropython, it’s relevant to any binding that can throw exceptions, such as C++.

This is more easily solvable: we could adopt a convention of having callbacks return 0, NULL, or nothing (depending on their normal return type) as a default or error state.

Wouldn’t this be the same cost as handling it within the binding itself?

Nevertheless, I think this is the best option, as I don’t see an easy way to make LVGL handle this case. The assumption in a standard C program is that control passes in and out of the function at some point. I think preventing that from being an issue would significantly complicate LVGL’s event loop.

I have this problem with the advanced demo and the only change over the official MP lvgl version is the attempt in modlvesp32 to keep the scheduler from overflowing. I will redo the entire setup with a fresh download. But imho there are no exceptions or the like involved.

I still think it’s worth trying with uasyncio and lv_async , where lv_task_handler can be called directly without scheduling.

I just restarted with an antirely fresh setup. To find this lockup problem but also to make sure that my current http version runs with an unpatched espidf. And guess what? No locks so far … dunno what I did previously.

The following is from the modlvesp32 I am now using. IMO it really makes sense doing it that way. With the previous version I’d expect other schedule attempts to also fail. The same schedule mechanism is used for interrupt handling, right? You should then see lots of lost interrupts in graphics high load situation. I really think this should be fixed.

static SemaphoreHandle_t schedule_in_progress;

STATIC mp_obj_t mp_lv_task_handler(mp_obj_t arg)
{
    lv_task_handler();
    xSemaphoreGive(schedule_in_progress);
    return mp_const_none;
}

STATIC MP_DEFINE_CONST_FUN_OBJ_1(mp_lv_task_handler_obj, mp_lv_task_handler);

static void vTimerCallback(TimerHandle_t pxTimer)
{
    lv_tick_inc(portTICK_RATE_MS);

    if(!xSemaphoreTake(schedule_in_progress, 0)) 
      return;

    if(!mp_sched_schedule((mp_obj_t)&mp_lv_task_handler_obj, mp_const_none))
      xSemaphoreGive(schedule_in_progress);
}

STATIC mp_obj_t mp_init_lvesp32()
{
    if (xTimer) return mp_const_none;

    lv_init();

    // create binary semaphore to make sure only one callback is being
    // scheduled at a time
    schedule_in_progress = xSemaphoreCreateBinary();
    xSemaphoreGive(schedule_in_progress);

    xTimer = xTimerCreate(
                "lvgl_timer",
                1,              // The timer period in ticks.
                pdTRUE,         // The timers will auto-reload themselves when they expire.
                NULL,           // User data passed to callback
                vTimerCallback  // Callback function
            );

    if (xTimer == NULL || xTimerStart( xTimer, 0 ) != pdPASS){
        ESP_LOGE(TAG, "Failed creating or starting LVGL timer!");
    } 

   return mp_const_none;
}

In general I agree that modlvesp32 should be fixed such that the scheduler queue is not overflown, but I’m not sure a blocking semaphore here is a good idea.

We should expect that in certain situations the previous call to lv_task_handler might not complete before it’s time to schedule the next call.
This can happen with high FPS and heavy rendering, but also in case the user callback is taking too long.
In such occasions it’s fine to skip the next call to lv_task_handler, and possibly lose a frame or two, but we should not skip the call to lv_tick_inc.

The problems I see with your suggestion are:

A different approach could be to use a counter and simply skip calls to lv_task_handler if the previous hasn’t finished yet (or keep one or two calls “in flight”).
The problem with that approach is that it breaks down once lv_task_handler is allowed not to return, due to exception that is thrown on a callback as discussed above.
In such case it might be worth catching Micropython exceptions in C and decrease the counter before propagating them further (kind of a “finally” block in C).

Where am I doing that? I don’t think I do that. lv_tick_inc runs freely at full rate.

And I also don’t think I am doing that. I just try the lock (using the 0 as a parameter to “Take”). If that fails I return immediately.

Oh I see, I thought you were trying to block execution.
So why not use a simple counter? What’s the benefit of a FreeRTOS semaphore here?

This can IMO be called from different threads and from irq context. Using the binary semaphore prevents race conditions if e.g. one thread is in the middle of decreasing your suggested counter e.g. from 0x0100 to 0x00ff and for a fraction of a second the other task sees 0x0000 … or the like.

I think using these mechanisms whenever potentially crossing task or even core boundaries is a good idea. There are also ounters available: https://www.freertos.org/CreateCounting.html

Uhm … you are right … in this case it’s the timer and the MP function which are running in the same context. A counter should be fine …

I don’t think they can be called from irq context.
vTimerCallback is called from FreeRTOS command queue thread while mp_lv_task_handler is called from Micropython main thread, so they are called from different threads.

So an atomic read/write of a counter is enough, but we can also use a counting semaphore if you think there is still a risk.

Did you consider the case of an exception that is thrown from lv_task_handler?

Actually I don’t understand how a python exception would stop the C function execution. This would IMO only work if there were RTOS tasks or threads involved which can be killed before they return. But you just told me that this all runs inside one context.

I could be wrong, but I believe MicroPython can manipulate the stack, so standard C calling conventions get thrown out the window. :wink: