Btnm callback passes a NULL object?

Description

Sometimes a button matrix callback function is passed a NULL object. I’m not strictly certain that this bug is limited to button matrices, but this is the only callback I call frequently. This is for an overlay to allow data-entry via 10-key (I stole inspiration from here).

What MCU/Processor/Board and compiler are you using?

Code::Blocks Simulator

What do you experience?

I originally saw this issue in my embedded application, but I can reproduce this in the Simulator as well. On occasion I am seeing a NULL object passed into my button matrix callback function. I was not originally error-checking for NULL, and this caused seg faults in my application.

What do you expect?

I don’t ever expect a callback to have a NULL object argument.

Code to reproduce

With the code below, I should never be able to hit a breakpoint on the line volatile int x = 0; but after pressing the buttons somewhat rapidly for a short amount of time (< 15s) the debugger halts on this line. It seems to be even faster in my embedded project, but I have a more difficult time debugging that.

static void cb_10key(lv_obj_t *obj, lv_event_t event) {
    switch(event) {
    case LV_EVENT_VALUE_CHANGED:
        if (obj != NULL) {
            const char *p = lv_btnm_get_active_btn_text(obj);
            if (strcmp(p, "Cancel") == 0) {
                // save nothing, go back
            } else if (strcmp(p, "Save") == 0) {
                if (ip_or_mask == CONF_IP) {
                    lv_ta_set_text(ta_ipaddr, lv_ta_get_text(ta_cfg));
                } else {
                    lv_ta_set_text(ta_maskaddr, lv_ta_get_text(ta_cfg));
                }
                // TODO: Save IP/Mask
            } else if (strcmp(p, "Bksp") == 0) {
                lv_ta_del_char(ta_cfg);
            } else {
                lv_ta_add_text(ta_cfg, p);
            }
        } else {
            volatile int x = 0;
        }
        break;
    };
}

For the sake of completion, here is the rest of my source in the overlay file:

#define CONF_IP         0
#define CONF_MASK       1

static void cb_10key(lv_obj_t *obj, lv_event_t event);

static lv_obj_t *ta_ipaddr;
static lv_obj_t *ta_maskaddr;
static lv_obj_t *ta_cfg;
static lv_obj_t *btnm;
static lv_obj_t *overlay;

static unsigned char ip_or_mask = CONF_IP; // ip address

const char *tenkey[] = {"1", "2", "3", "\n",
                        "4", "5", "6", "\n",
                        "7", "8", "9", "\n",
                        "Bksp", "0", ".", "\n",
                        "Cancel", "Save", ""};

void create_overlay(void)
{
    static lv_style_t overlay_style;
    /* Create a full-screen background */
    lv_style_copy(&overlay_style, &lv_style_plain_color);

    /* Set the background's style */
    overlay_style.body.main_color = overlay_style.body.grad_color = LV_COLOR_BLACK;
    overlay_style.body.opa = LV_OPA_50;

    /* Create a base object for the overlay background */
    overlay = lv_obj_create(lv_scr_act(), NULL);
    lv_obj_set_style(overlay, &overlay_style);
    lv_obj_set_pos(overlay, 0, 0);
    lv_obj_set_size(overlay, LV_HOR_RES, LV_VER_RES);

    lv_obj_t *outer = lv_cont_create(overlay, NULL);
    lv_obj_set_style(outer, &overlay_style);
    lv_obj_set_size(outer, 400, 200);
    lv_cont_set_layout(outer, LV_LAYOUT_ROW_M);
    lv_obj_align_origo(outer, NULL, LV_ALIGN_CENTER, 0, 0);

    // Text area container
    lv_obj_t *cont_ta = lv_cont_create(outer, NULL);
    lv_obj_set_size(cont_ta, 190, 200);
    lv_obj_set_style(cont_ta, &overlay_style);
    lv_cont_set_layout(cont_ta, LV_LAYOUT_COL_M);

    lv_obj_t *label = lv_label_create(cont_ta, NULL);
    lv_label_set_text(label, "Please enter IP Address");
    lv_obj_set_height(label, 50);

    ta_cfg = lv_ta_create(cont_ta, NULL);
    lv_ta_set_one_line(ta_cfg, true);
    lv_obj_set_size(ta_cfg, 150, 50);
    lv_ta_set_cursor_type(ta_cfg, LV_CURSOR_LINE);
    lv_ta_set_text(ta_cfg, ""); // clear "Text Area" from obj
    if (ip_or_mask == CONF_IP)
        lv_ta_set_placeholder_text(ta_cfg, "e.g. 192.168.0.55");
    else
        lv_ta_set_placeholder_text(ta_cfg, "e.g. 255.255.255.0");

    // 10-key container
    lv_obj_t *cont_keys = lv_cont_create(outer, NULL);
    lv_obj_set_size(cont_keys, 190, 200);
    lv_obj_set_style(cont_keys, &overlay_style);
    lv_cont_set_layout(cont_keys, LV_LAYOUT_COL_M);

    /* Create the message box as a child of the overlay background */
    btnm = lv_btnm_create(cont_keys, NULL);
    lv_obj_set_size(btnm, 190, 190);
    lv_btnm_set_map(btnm, tenkey);
    lv_obj_set_event_cb(btnm, cb_10key);
}

The worst type of bug is one where you can’t reliably reproduce it. :slightly_smiling_face:

This is really weird, and not logically possible. To call an event handler you would need to know the object associated with it.

I can try to debug it on my PC on the weekend, but given the circumstances I might not be able to make it happen.

Could you try looking through the backtrace to find where the NULL pointer originates from?

I thought this was odd, and I spent a little time trying to figure out what was blanking out the object. I couldn’t make any sense of it either.

I’m glad you challenged me on this, because I went back to try to reproduce this and I’m not seeing it hit my breakpoint this morning. I may have noticed the program halt and assumed it was at the breakpoint, but this morning I am able to (somewhat-)reliably get it to crash when trying to compare the string ‘p’ as it is NULL. This is giving me a segfault that, I suppose, I was interpreting as hitting the breakpoint yesterday.

With that all said I suppose my question should instead be: “What can cause lv_btnm_get_active_btn_text(obj) to return NULL?” :slight_smile:

I’m quite confused this morning, but I am going to imbibe some caffeine and see if I can make sense of this mess in my head.

One quick update. After storing the return value from lv_btnm_get_active_btn_text(obj) into *p I now check to see if p is NULL, and if so return out of the callback.

I see inside of that function that there is some error-checking to identify which button is active. I suppose there is probably a race-condition somewhere in here that causes this id to return to LV_BTNM_BTN_NONE. On the simulator I was clicking fairly quickly as I figured more clicks would create this null object issue faster, but perhaps I’m tricking the event handler into thinking that I’ve clicked nothing?

For now I think my workaround (and probably a smart one to leave in there anyway) is to ensure that my pointer is not NULL before I try to compare it to my strings. Just wanted to shed a little light on this as I was thoroughly confused by what I had done!

Well I’ve run this ‘fix’ for a few days now and haven’t seen any spurious segfaults, so I think I’ve sufficiently covered my rear. I still don’t know why I’m seeing the occasional NULL return from getting the active button text, but I am content with believing it is a race condition for which I am not properly accounting.

I’m confused as to which pointer is NULL. Is it the object pointer that’s NULL or the return value from lv_btnm_get_active_btn_text?

I think this is probably a bug in LittlevGL because I don’t see why anything should be NULL inside the LV_EVENT_VALUE_CHANGED callback.

Sorry. To clarify, I think I misunderstood what I was looking at originally where I was asking about the NULL object pointer. The object pointer does not actually appear to be NULL, but instead the return pointer from lv_btnm_get_active_btn_text is on occasion NULL.

Okay, now I understand better. Thanks for the clarification. :slightly_smiling_face:

I’m not sure if you have the time/patience to do this, but would you be willing to try the following steps?

  1. Revert your workaround.
  2. Trigger the bug (i.e. lv_btnm_get_active_btn_text returning NULL).
  3. When the bug happens, check the backtrace and find out which of these lines called lv_event_send.

That way we can narrow down the issue to one of the signal handlers and proceed from there.

Of course! It’s the least I can do. I may not actually have the time, but I’ll do it anyway :slight_smile:

Is this what you are looking to see? It looks like lv_btnm_signal calls it (when it thinks the button is not LV_BTNM_BTN_NONE).callstack

Thanks! I appreciate it.

Which line of lv_btnm_signal?

I’d like to know what happens when it does think the button is LV_BTNM_BTN_NONE.

This is line 758 of lv_btnm.c. I never explicitly mentioned it in this post, but I am using version 6.0.2 according to lv_version.h.

At this line of the btnm function it thinks that the pressed button is not equal to LV_BTNM_BTN_NONE, but when I call lv_btnm_get_active_btn_text(obj) in my callback I receive a NULL pointer.

lv_btnm_get_active_btn_text will return NULL if the ext->btn_id_act is equal to LV_BTNM_BTN_NONE. I just added a breakpoint at line 488 of lv_btnm.c and verified that this is being hit.

Just a quick FYI, I am out next week (Thanksgiving in the US) so I probably will not be checking here often if at all. I’ll try to take a peek though!

No problem! Thanks for the information. I will see if I can find the issue once I have some time.

@embeddedt just curious if you had a chance to look into this last week? I am back in the office this week and can probably try to help dig into this if needed.

My workaround has gotten rid of the random crashes meaning that this is not a dire need for me, but I want to make sure you aren’t stuck needing anything from my end.

Otherwise I’ll be working on figuring out why adding one character of text to a table cell is causing my screen draw to blow up despite it working fine in the simulator… :slight_smile:

Sorry; I didn’t have time to look at it yet. I appreciate the offer of help; I’ll let you know if I need any other information.

Possibly something to do with the heap? Adding a character would increase the required buffer size by one byte. I’ve noted that heap issues are particularly common when stack sizes aren’t large enough, because the stack smashes downward onto the heap. :wink: Just a thought.

Yeah, that was my first thought as well. I’m running an embedded Linux platform with 256MBytes RAM and fairly resource-lean software. If it is a heap deficiency then I have bigger problems!

I figured it had something to do with me using too much space inside my container, but there is absolutely enough room for the text without it causing a wrap or anything like that…

It’s just strange to me that this works on the simulator, and not in my embedded environment. Perhaps I need to see if any of my configuration parameters are different aside from enabling the touchscreen and platform-specific things like that. Either way, I’ll write up another forum post if I can’t figure it out. Thanks!

Drat. I know exactly what this problem is, and it’s something I looked at previously, temporarily worked around, and am now getting bit by again. It has to do with the way I package and send the data to my display board. Thankfully it’s no OS or lvgl issue (the latter of which I could not accept as it worked fine in the simulator).

Whew!

1 Like