How to display video stream from esp-cam

We actually have a JPEG decoder library already (lv_lib_split_jpg). If you use that, all you should need to do is fetch a JPEG over HTTP into a buffer, and then use a fake image descriptor. Here is an example - it’s a bit incomplete but I think it gives the general idea:

lv_obj_t *img_obj;
static void update_img(void *downloaded_jpeg_buf, size_t buf_size) {
    static lv_img_dsc_t jpeg_dsc = {
        .header.always_zero = 0,
        .header.w = <camera width>,
        .header.h = <camera height>,
        .header.cf = LV_IMG_CF_RAW,
    };
    /* Set the buffer location and size each time in case it changes */
    jpeg_dsc.data = downloaded_jpeg_buf;
    jpeg_dsc.data_size = buf_size;
    lv_img_cache_invalidate_src(&jpeg_dsc); /* invalidate JPEG so it gets decoded again */
    lv_img_set_src(img_obj, &jpeg_dsc);
}

You would want to call update_img each time you get a new JPEG from the camera.

The bottleneck here will probably be how quickly HTTP requests can be sent and responded to, and how fast the JPEG can be decoded. 5-10fps is probably doable.