Micropython Display Drivers

bdbarnett · October 9, 2023, 7:31pm

This is a continuation from a post in another thread: Maintainer wanted - #13 by kdschlosser

I am developing a GUI framework for LVGL Micropython. See https://github.com/bdbarnett/lvmp_panels/blob/main/assets/README.md. I primarily develop on Windows Subsystem for Linux using the Master branch of lv_micropython. I am testing on an assortment of ESP32-S3 based boards along with several displays, including GC9A01, ST7789, ILI9341 and HX8357D. Currently, as far as I know, there aren’t any fast lv_micropython drivers for any display on the ESP32-S3 due to the incompatibility between the ESP-IDF versions used in the lv_micropython drivers versus what is required to work with the ESP32-S3. I am currently using Micropython-only drivers for the boards above, and they are nowhere near as responsive as the SDL driver under Linux.

I would like to assist with the effort to overhaul display drivers in lv_micropython. I don’t know C or C++, but I consider myself above average in Micropython. I like the way display drivers are implemented in Circuitpython, and recommend something similar as the path forward, but will be happy with any progress, no matter which direction it is in. The heavy lifting in CP is a C library called DisplayIO. It is the “common API for all of the drivers” as @kdschlosser put it. Display drivers in CP are a Python class wrapping the Display class in DisplayIO. The driver simply sets up the pin mapping and contains the init script, possibly with some timing values. Doing it that way, there would actually be only one “base” driver providing the API (the equivalent to DisplayIO) and each display would then have it’s own driver containing only the information that makes it unique.

I also like the way this ST7789 driver for Micropython (without LVGL) is implemented. GitHub - russhughes/st7789_mpy: Fast MicroPython driver for ST7789 display module written in C. It has lots of methods that we don’t need in LVGL, such as drawing lines, so it could be trimmed down, possibly to just the framebuffer and blit method. If we could somehow pull out the parts of that driver that are common to other displays and leave that as lv_micropython’s DisplayIO, then add to that the init scripts that are contained in CP’s DisplayIO drivers, I think we’d have a very scalable solution that would serve the community for a long time. I’m willing to take this on, even the C portion, but I would need a fair amount of guidance as to what is going on under the hood in C.

Anyway, I’m hoping to hear from some of you who have been a part of LVGL for much longer than I have. I know I’m the newbie.

andrewleech · October 9, 2023, 10:54pm

Thanks for raising this, I’m a regular contributor to micropython and personally quite keen to see LVGL being seen as a standard part of the mpy ecosystem.

I agree that the lack of consioidated ready-to-go drivers is holding this back.

I’d love to see LVGL work as the GUI “creation” side of things, writing into a common micropython layer like framebuf and then have a standard api for drivers to be written for that glue framebuf to the hardware. framebuf does also include some drawing tools like line, basic text etc which is not helpful here… arguably framebuf isn’t providing anything needed here other than a concept of a width x height x graphics format buffer that is intended to be drawn somewhere - but this can be seen as a standard interface between the graphics “creation” / drawing side of a library and the hardware / displaying side.

Having a clear separation between the two sides would make it easier to re-use other existing drivers I believe, as well as clearly articulate what’s needed for writing new drivers which don’t neccesarily need to all live in the one repo then.

CP’s DisplayIO is interesting in that it’s got some driver support already, but it’s also quite integrated into some of their other codebase by the look of things and includes a bunch of graphics creation stuff we don’t need with LVGL.

russhughes has a stack of really high quality graphics drivers which are absolutely worth looking at; his standard repo structure also include a lot of graphics creation as you say, like lines, widgets, fonts.

Arguably LVGL doesn’t really need to know about the display driver at all if it can just handle filling a framebuffer, other than needing apropriate api hooks to force refresh (partial / full) and query any dispaly information that may be required.

This is separate to the input side of course, touchscreen and other controls are certainly needing to be part of the overall api, but the pure-python implementations of them for lvgl aren’t neccesarily a problem in my mind.

SL2021-Dev · October 17, 2023, 2:53pm

But PLEASE PLEASE PLEASE FOCUS on being able to initializing multiple devices on the SAME SPI bus.
I currently have yet to find a method to share an SPIbus with an SD card, Display and Touch.
Using the python drivers ili9XXX.py and xpt2046.py with either machine.SdCard or sdcard.py drivers
I could not find any combination where these three divices would work together.
I have also tryed using the lv_spi.py with no success.

Most new micropython coders are unaware of these additional settings (as I) that are buried in the
ili9XXX.py and xpt2046.py driver code with no documentation to help with configuring their devices.

example ili9XXX.py
needed…
miso=5, mosi=18, clk=19, cs=13, spihost=esp.HSPI_HOST, spimode=0, mhz=40,

could be simplified…
factor=4, hybrid=True, width=240, height=320, start_x=0, start_y=0,
invert=False, double_buffer=True, half_duplex=True, display_type=0,
asynchronous=False, initialize=True, color_format=None

unnecessary…
dc=12, rst=4, power=14, backlight=15, backlight_on=0, power_on=0,

And having preconfigured Pin configurations is not ideal !!!

I know this DIATRIBE is outside of the framework of display drivers,
but I think the bigger conversation is compatibility with other devices

Possibly a thread for the maintainers should be started for this exact topic
as you are doing for display drivers.

kdschlosser · October 22, 2023, 5:29am

OK so I am going to touch base on a couple of things here. MicroPython will never be as fast as the SDL driver on Linux. There is a verty large amount of overhead from MicroPython Python in general is not exceedingly fast, some 200 times slower than C code. So keep that in mind. There are things that can be done in MicroPython that will speed things up but it will still not be as fast as C code.

Some of the things that can be done is using the viper code emitter in MicroPython. tweaking the indev and refresh timers. those kinds of things.

The problem with leveraging the framebuffer that is built into MicroPython is there is no way to specify where to allocate the buffer. You would not get the benefits of DMA or being able to target the buffer into RAM instead is SPIRAM.

Now onto the issue of being able to have more than one device on the same bus the SPI is using. Big issue here is going to be bandwidth and there is not going to be enough of it. Second issue is for whatever reason quite a few of the lower cost displays will not function properly because they don’t change the chip select for the display correctly. My suggestion is going to be to connect the cs line from the display to ground and connect the MOSI wire and the SCK wire. that are the only 2 you will will need to connect between the MCU and the display. This is a tad bit more palatable to do. If you want other features like power control and backlight control you will have to use additional pins for that, no way around it.

There is also the problem of MicroPython not supporting the full set of features available to a specific board when using SPI. The bare minimum is what is given and that is done to keep the API the same across the board. In order to get the best performance the SPI that is available in the MicroPython API cannot be used on some board. One of those boards being the ESP32. In order to get the best performance the IDF API for SPI is exposed so it can be used from inside of MicroPython It is not user friendly which is why there is a need for the really large number of parameters in the constructor for a driver.

The biggest issue with the display drivers is the lack of an API that is the same across all of the drivers. You also have all of the drivers dropped into a single file which uses up quite a bit of additional memory when all but one of the drivers is not even used. There needs to be a base class that driver classes would subclass and that would give the common API. Now to throw a wrench into the works there was talk of having drivers written in C for LVGL. These drivers would be board specific and impliment whatever is needed for them to work for a specific MCU. It is a good idea but I feel it is not a good idea for MicroPython. This is because MicroPython is a runtime language not a compile time one. Specifying a board at compile time is one thing but having to compile specifically for a display is a completely different animal. I believe the drivers for MicroPython should remain how they are but they need to be cleaned up.

The way LVGL exposes the SPI functions and enumerations to MicroPython should remain the same. This allows for the drivers to be written in Python which makes it easier to debug and develop. Right now part of the driver resides in Python code and another portion of it resides in C code. It all can be written in Python and if any byte positioning changes need to be made this can be done in a function written to use the viper code emitter so close to the same speed as C code is achieved.

bdbarnett · November 27, 2023, 6:43am

I have a HUGE announcement! I have written ESP32 display drivers in C based on Russ Hughes’s S3LCD drivers. They are not linked to LVGL, yet they work with LVGL fantastically. Since they aren’t linked to LVGL, there is no need for lv_micropython_bindings to be linked to ESP-IDF, overcoming the problem with compiling LVGL for ESP32 targets. They use DMA and non-blocking. I have tested them with several displays and ESP32S3 boards, including the WT32-SC01-Plus, which has an Intel 8080 bus (8 bit) to an ST7796 chip, and a Seeed Studio Round Display for Xiao with SPI bus GC9A01 chip connected to an Adafruit QT Py ESP32-S3. The driver should work on all ESP32 targets, but I have only tested on ESP32-S3. I have not tested on lv_micropython with Micropython 1.21, but it works flawlessly on lv_micropython with MIcropython 1.20. If I get good feedback from the community, I plan to add RP2 and STM32 targets, as well as others that I can get the hardware to test on. The documentation is good enough for someone familiar with compiling Micropython to follow, but I haven’t had the time to learn Markdown yet, so it is formatted poorly. You can find them at GitHub - bdbarnett/mpdisplay: C Display Drivers for Micropython including LV_Micropython.

I created a file lvmp_devices.py that makes it VERY simple to use the display drivers with lv_micropython. It serves a similar purpose as display_utils.py, but is much more powerful. It also makes it easy to use generic (not-LVGL specific) Micropython drivers for touch and encoders too! Because it isn’t tied only to my mpdisplay drivers, I put it in its own repo at GitHub - bdbarnett/lvmp_devices: Micropython device manager wrapper for LVGL display and touchscreen drivers.

Please give me a few days to make sure it’s working correctly with lv_micropython on Micropython 1.21. However, I would appreciate you testing it out with lv_micropython 1.20 and giving me any feedback either here or on the discussion boards at those 2 repos. I’m very excited about releasing these and hope you are too!

kdschlosser · November 27, 2023, 6:49am

already got it all up and running.

Compiles using 5.0+ of the ESP-IDF and it uses the latest version of MicroPython. The bus drivers are written and work with it. they are exposed to the python side of things.

kdschlosser · November 27, 2023, 6:54am

How do you compile that thing considering LVGL will not currently compile using the latest version of micropython?

bdbarnett · November 27, 2023, 6:59am

The drivers are strictly written using ESP-IDF and Micropython, so they will compile on bare Micropython. I commented out the ESP specific code in mkrules.cmake here: https://github.com/bdbarnett/mpdisplay/tree/main/examples/lv_bindings

The magic is done by the lvmp_devices.py that I linked to earlier. Take a look to see what I mean.

The work you have done on making MP 1.21 is fantastic!!! I’d like to chat with you offline if you’re open to it.

kdschlosser · November 27, 2023, 7:35am

Thanks, not sure what you mean by offline.

bdbarnett · November 27, 2023, 7:38am

The forum chat, discord, anything off the forum boards. I’m super excited about you getting LVGL to compile as a USER_C_MODULE for ESP32. I’m willing to chip in and help if you need anything.

kdschlosser · November 27, 2023, 7:39am

I do have a question tho. how do you go about making the frame buffers? since you have removed the code from the build that is reliant on the IDF you have no way to specify creating a frame buffer in DMA memory or in PSRAM.

bdbarnett · November 27, 2023, 7:50am

The module is mpdisplay and it has 4 classes under it: Display, I80_bus, Spi_bus, and CAPS. there is also a function in mpdisplay called allocate_buffers. You can see the code here:

github.com

bdbarnett/mpdisplay/blob/6b12ae5448014fec198cc59980096e11e598429d/src/mpdisplay_esp.c#L342


      
                  self->bus_handle.i80 = NULL;
              } else if (mp_obj_is_type(self->bus, &mpdisplay_spi_bus_type)) {
                  mpdisplay_spi_bus_obj_t *config = MP_OBJ_TO_PTR(self->bus);
                  spi_bus_free(config->spi_host);
                  self->bus_handle.spi = NULL;
              }
          
              return mp_const_none;
          }
          
          /// .allocate_buffer(size, cap)
          /// Create a buffer using heap_caps_malloc and return it as a bytearray
          /// required parameters:
          ///  -- size: size of buffer
          /// optional parameters:
          ///  -- caps: DMA capability (default=MALLOC_CAP_DMA)
          mp_obj_t mpdisplay_allocate_buffer(size_t n_args, const mp_obj_t *args) {
              mp_int_t size = mp_obj_get_int(args[0]);
              mp_int_t caps = (n_args == 2) ? mp_obj_get_int(args[1]) : MALLOC_CAP_DMA;
          
              if (size > 65536) {

The mpdisplay.allocate_buffers function is a parameter to Devices class in lvmp_devices.py. Devices looks for that parameter, and if it doesn’t see it, it just uses bytearray. If it does see it, it uses it. In this case, since mpdisplay.allocate_buffers uses heap_caps_malloc with MALLOC_CAP_DMA as a parameter, we end up with DMA addressable buffers.

I hope you’ll give the mpdisplay_simpletest.py a whirl. It’s pretty impressive. I’ve followed your work online for quite a while and value your feedback.

kdschlosser · November 27, 2023, 8:02am

ok so there are some issues. First is DMA is not going to work properly.

This is incorrect

            .max_transfer_sz = 65536,

This is a large waste of memory

            .trans_queue_depth = 10,

You have doubled the memory use because of storing the function parameters in a structure and then placing those into another structure when the display is initialized. That consumes twice as much memory.

It’s hard to keep a strong focus on this running on something that has a very limited amount of resources. so doing things like this


mpdisplay_display_rotation_t ROTATIONS_320x480[4] = {
    {320, 480, 0, 0, false, true,  false},
    {480, 320, 0, 0, true,  false, false},
    {320, 480, 0, 0, false, false, true},
    {480, 320, 0, 0, true,  true,  true}
};

mpdisplay_display_rotation_t ROTATIONS_240x320[4] = {
    {240, 320, 0, 0, false, false, false},
    {320, 240, 0, 0, true,  true,  false},
    {240, 320, 0, 0, false, true,  true},
    {320, 240, 0, 0, true,  false, true}
};

mpdisplay_display_rotation_t ROTATIONS_170x320[4] = {
    {170, 320, 35, 0, false, false, false},
    {320, 170, 0, 35, true,  true,  false},
    {170, 320, 35, 0, false, true,  true},
    {320, 170, 0, 35, true,  false, true}
};

mpdisplay_display_rotation_t ROTATIONS_240x240[4] = {
    {240, 240, 0, 0, false, false, false},
    {240, 240, 0, 0, true,  true,  false},
    {240, 240, 0, 80, false, true,  true},
    {240, 240, 80, 0, true,  false, true}
};

mpdisplay_display_rotation_t ROTATIONS_135x240[4] = {
    {135, 240, 52, 40, false, false, false},
    {240, 135, 40, 53, true,  true,  false},
    {135, 240, 53, 40, false, true,  true},
    {240, 135, 40, 52, true,  false, true}
};

mpdisplay_display_rotation_t ROTATIONS_128x160[4] = {
    {128, 160, 0, 0, false, false, false},
    {160, 128, 0, 0, true,  true,  false},
    {128, 160, 0, 0, false, true,  true},
    {160, 128, 0, 0, true,  false, true}
};

mpdisplay_display_rotation_t ROTATIONS_80x160[4] = {
    {80, 160, 26, 1, false, false, false},
    {160, 80, 1, 26, true, true, false},
    {80, 160, 26, 1, false, true,  true},
    {160, 80, 1, 26, true,  false, true}
};

mpdisplay_display_rotation_t ROTATIONS_128x128[4] = {
    {128, 128, 2, 1, false, false, false},
    {128, 128, 1, 2, true,  true,  false},
    {128, 128, 2, 3, false, true,  true},
    {128, 128, 3, 2, true,  false, true}
};

mpdisplay_display_rotation_t *ROTATIONS[] = {
    ROTATIONS_240x320,              // default if no match
    ROTATIONS_320x480,
    ROTATIONS_170x320,
    ROTATIONS_240x240,
    ROTATIONS_135x240,
    ROTATIONS_128x160,
    ROTATIONS_80x160,
    ROTATIONS_128x128,
    NULL
};

consumes a lot of resources and those resources never get released because the allocation is done in the global namespace. The above consumes 384 bytes of memory when it is only going to get used a single time.

Returning the frame buffer as a bytearray is going to consume quite a bit more memory as well. Returning it as a memoryview or an an array.array would be a better thing to do, something that the size is not able to expand or contract. Giving that ability has cost. with the frame buffer it can stay as a fixed size because it is never going to change size.

kdschlosser · November 27, 2023, 8:04am

It’s good, little hard to follow the data path through the code. I am sure you would improve upon it. It is your first release. My intention is not to criticize your work, it is to give some pointers from things I noticed with a quick look.

bdbarnett · November 27, 2023, 8:33am

Good point about returning it as a bytearray rather than memoryview or array.array. I’ll check that out. I can’t find where setting max_transfer_sz to 65536 makes an impact, but I’m not saying you are wrong. I left the .transfer_queue_depth to the value it was set to in the Russ Hughes’s S3LCD, but am open to changing it.

I had at one point removed all the display rotations that Russ had included as well, but realized they come in very handy for people adding a new display chip. It makes it so they only have to provide an init string if their display’s rotations match up with these. They are easily taken out and the only impact is they have to be provided in display drivers that wouldn’t otherwise need them. That’s not a big deal to me to remove.

I know there are better ways to profile than just relying on gc.mem_free(), but here’s a quick and dirty test. display_driver.py:


""" WT32-SC01 Plus 320x480 ST7796 display """
import gc

gc.collect()
print(f"Starting Memory:  {gc.mem_free()}")

import mpdisplay
from st7796 import init_sequence, rotations
from backlight import Backlight
from machine import Pin, I2C
from lvmp_devices import Devices

reset=Pin(4, Pin.OUT, value=1)

bus = mpdisplay.I80_bus(
    (9, 46, 3, 8, 18, 17, 16, 15),
    dc=0,
    wr=47,
    cs=6,
    pclk=20_000_000,
    swap_color_bytes=True,
    reverse_color_bits=False,
)

display_drv = mpdisplay.Display(
    bus,
    width=320,
    height=480,
    bpp=16,
    reset=-1,
    rotation=1,
    bgr=True,
    invert_color=True,
    init_sequence=init_sequence,
    rotations=rotations,
)

backlight=Backlight(Pin(45, Pin.OUT), on_high=True)

gc.collect()
print(f"Memory after bus, display and backlight:  {gc.mem_free()}")

devices = Devices(
    display_drv = display_drv,
    bgr = True,
    factor = 6,
    blit_func = display_drv.blit,
    alloc_buf_func = mpdisplay.allocate_buffer,
    register_ready_cb_func = display_drv.register_cb,
    )

gc.collect()
print(f"Memory after allocating two 51,200 byte buffers and attaching to LVGL:  {gc.mem_free()}")

import display_driver
Starting Memory:  2044192
Memory after bus, display and backlight:  2035168
Memory after allocating two 51,200 byte buffers and attaching to LVGL:  2030320

matt.trentini · November 27, 2023, 11:08am

I think it might be brutally early for you @bdbarnett (5am if I’m getting my timezone’s correct? ) but maybe you’d be interested in joining in with a chat tomorrow to discuss LVGL/MicroPython?

bdbarnett · November 27, 2023, 6:02pm

I’ll be there! Thanks for the invitation. Yes, it is 5:00am for me, but that’s OK.

kdschlosser · November 27, 2023, 7:51pm

As far as the max_transfer_sz is concerned. This gets set to the size of the buffer being used when using DMA memory otherwise it gets set to SOC_SPI_MAXIMUM_BUFFER_SIZE. That is for SPI. for I8080 it always gets set to the size of the buffer.

If you take a look at the drivers I have written for the ESP32 you should be able to use that to add I2C support and also RGB support to your drivers. I8080 supports up to 24 lanes, how many are supported is dependent on the MCU being used. If you look at the I8080 driver I wrote you will be able to see how I handled the additional lanes.

Because the bus and the display are 2 separate components and you have kept them that was in the drivers that you wrote as well I would recommend not allowing the user to define the memory allocation. Only all them to optionally supply a buffer size and if they do not provide one then you set it to be 1/10th of the screen size which seems to be a optimum for the ESP32. The RTGB driver for the most parts does it’s own thing as far as allocating the buffer and the buffer is able to be collected and passed to LVGL without any issues.

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/spi_master.html#_CPPv4N16spi_bus_config_t15max_transfer_szE

It tells you what it defaults to when using the different memory types. There is a hard limit when using non DMA memory and that hard limit is SOC_SPI_MAXIMUM_BUFFER_SIZE. For DMA because the actual memory that has been allocated is what is being used in the transfer there is no hard limit. BUT there still needs to be a cap set.

so say I have a display that is 800 x 600 and it is a 24bit display. That is a total screen size of 1,440,000 bytes. So if I set the frame buffer to the optimum 1/10th I need to be able to transfer 144,000 bytes in a single shot. so the setting you have is 65,536 bytes so the transfer would fail. There are displays that are that high a resolution and use SPI as the transport mechanism most use i8080 or RGB or something other but there should not be a hard limit coded in like that. It would even fail using a 640x480x24 display with 1/10th fb size.

You have the same hard limit also coded for I8080 as well.

bdbarnett · November 27, 2023, 9:15pm

I’ll definitely start with your code for I2C and RGB.

Thank you very much for the explanation of max_transfer_sz. I’ll have to rethink how I set that value. Do you know if there is a reason to not set it arbitrarily large?

I want my drivers to work on lv_micropython as well as any other Micropython graphics engine that can blit. I need the developer to be able to decide what size and how many buffers they need. For instance, in mpdisplay_simpletest.py I pre-allocate 64 32x32 buffers. In this case they are just filled with random colors and randomly placed on the screen, but they could be 64 sprites in a retro game. A single screen-size buffer in Spiram could be used in places that normally framebuf.Framebufffer would be used. I want to allow creating as many buffers as resources allow and not have them all be the same size, allowing the developer full control over those details. Similar to not creating a hard limit on the buffer size with max_transfer_sz, I also don’t want to create a hard limit on the quantity.

If max_transfer_size can be changed on the fly, I may check each time a new buffer is allocated and increase the value if the new buffer is larger than the old value, keeping a high water mark. I’ll have to play around with it. I’m open to suggestions.

kdschlosser · November 27, 2023, 9:24pm

You need to do a memory free check before importing any of the display modules and then once the display is fully initialized. Make sure you do a garbage collection prior to checking the amount of free memory.

From your example without frame buffers just the driver is using 9,024 bytes of memory. That is actually a lot of memory use for a device that only has 320K of memory that is able to be used as heap memory. That’s almost 3% of the total of available memory. IDK what percentage it is of the available memory after MicroPython is loaded but considering MicroPython also has to reside in that same 320K that percentage is only going to go up. For some reason I want to say it’s around 100K. I do know that it’s initial amount allocated is 64K so the best case scenario is you would have 256K available for user use.

A single 128x128 PNG is going to take up 7,415 bytes of ram. once loaded into LVGL it is going to take up
65,536 bytes of memory. Every single byte that can be spared should be spared.

So here is the memory use for the structures involved in setting up an SPI bus along with the size of the structure.

esp_lcd_panel_io_t = esp_lcd_panel_io_handle_t = 20 bytes
esp_lcd_panel_t = 36 bytes
esp_lcd_panel_io_spi_config_t = 56 bytes
spi_bus_config_t = 48 bytes

160 bytes

That’s a really far cry from 9k. I don’t believe that the esp_lcd component is going to use up 9k of that either. I know there is a lot that will get chewed up because of MicroPython. Need to see what differences can be made to trim that amount down…

Getting rid of the large number of integer constants used in the setting of the buffer sizes is one place to cut some bytes off.

    {MP_ROM_QSTR(MP_QSTR_EXEC), MP_ROM_INT(MALLOC_CAP_EXEC)},
    {MP_ROM_QSTR(MP_QSTR_32BIT), MP_ROM_INT(MALLOC_CAP_32BIT)},
    {MP_ROM_QSTR(MP_QSTR_8BIT), MP_ROM_INT(MALLOC_CAP_8BIT)},
    {MP_ROM_QSTR(MP_QSTR_DMA), MP_ROM_INT(MALLOC_CAP_DMA)},
    {MP_ROM_QSTR(MP_QSTR_SPIRAM), MP_ROM_INT(MALLOC_CAP_SPIRAM)},
    {MP_ROM_QSTR(MP_QSTR_INTERNAL), MP_ROM_INT(MALLOC_CAP_INTERNAL)},
    {MP_ROM_QSTR(MP_QSTR_DEFAULT), MP_ROM_INT(MALLOC_CAP_DEFAULT)},
    {MP_ROM_QSTR(MP_QSTR_IRAM_8BIT), MP_ROM_INT(MALLOC_CAP_IRAM_8BIT)},
    {MP_ROM_QSTR(MP_QSTR_RETENTION), MP_ROM_INT(MALLOC_CAP_RETENTION)},
    {MP_ROM_QSTR(MP_QSTR_RTCRAM), MP_ROM_INT(MALLOC_CAP_RTCRAM)},

A single integer in MicroPython takes up 32 bytes of memory. That right there is using up 320 bytes by itself. I am not 100% on this but I believe that the memory is not doing to get used unless the CAPS attribute is accessed, But the question becomes is that allocation to the modules global namespace or is it to the locals namespace where it might get freed if the call was made from the inside of a function or method. I do not have an answer to that question and you would need to run some tests to find out. But that is a hell of a lot of memory use I can say that.

I ran a test to see the sizes of the different types that can be used for frame buffer objects.

MicroPython v1.21.0-dirty on 2023-11-26; linux [GCC 11.4.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import gc
>>> import array
>>> gc.collect()
20
>>> gc.mem_free()
2069568
>>> test = array.array('B', [0] * 1024)
>>> gc.collect()
29
>>> gc.mem_free()
2068480
>>>

an array.array takes up 1,088 total bytes for a buffer that is 1024 bytes in size
so there is 64 bytes of overhead just for the array itself.

and here is for a bytearray

MicroPython v1.21.0-dirty on 2023-11-26; linux [GCC 11.4.0] version
Use Ctrl-D to exit, Ctrl-E for paste mode
>>> import gc
>>> gc.collect()
12
>>> gc.mem_free()
2069600
>>> test = bytearray([0] * 1024)
>>> gc.collect()
29
>>> gc.mem_free()
2068480
>>>

1,120 bytes used for a bytearray that is 1024 bytes. That’s 96 bytes of overhead. So 32 additional bytes are being used compared to a an array.

I know it’s not a lot but there is an amount to it that is being used. Whenever possible you want to reuse buffers or keep variables that don’t get used all that often inside a function where it is apart of the global namespace and can be garbage collected.

If you look at the current drivers that are available for a display you will see that the initialization commands are all stored in a classes namespace. That means that once the driver has been initialized those parameters just sit there never being used again taking up memory.

class ili9341(ili9XXX):

    def __init__(self,
        miso=5, mosi=18, clk=19, cs=13, dc=12, rst=4, power=14, backlight=15, backlight_on=0, power_on=0,
        spihost=esp.HSPI_HOST, spimode=0, mhz=40, factor=4, hybrid=True, width=240, height=320, start_x=0, start_y=0,
        colormode=COLOR_MODE_BGR, rot=PORTRAIT, invert=False, double_buffer=True, half_duplex=True,
        asynchronous=False, initialize=True, color_format=lv.COLOR_FORMAT.NATIVE_REVERSED
    ):

        # Make sure Micropython was built such that color won't require processing before DMA

        if lv.color_t.__SIZE__ != 2:
            raise RuntimeError('ili9341 micropython driver requires defining LV_COLOR_DEPTH=16')

        self.display_name = 'ILI9341'

        self.init_cmds = [
            {'cmd': 0xCF, 'data': bytes([0x00, 0x83, 0X30])},
            {'cmd': 0xED, 'data': bytes([0x64, 0x03, 0X12, 0X81])},
            {'cmd': 0xE8, 'data': bytes([0x85, 0x01, 0x79])},
            {'cmd': 0xCB, 'data': bytes([0x39, 0x2C, 0x00, 0x34, 0x02])},
            {'cmd': 0xF7, 'data': bytes([0x20])},
            {'cmd': 0xEA, 'data': bytes([0x00, 0x00])},
            {'cmd': 0xC0, 'data': bytes([0x26])},               # Power control
            {'cmd': 0xC1, 'data': bytes([0x11])},               # Power control
            {'cmd': 0xC5, 'data': bytes([0x35, 0x3E])},	        # VCOM control
            {'cmd': 0xC7, 'data': bytes([0xBE])},               # VCOM control

            {'cmd': 0x36, 'data': bytes([
                self.madctl(colormode, rot, ORIENTATION_TABLE)])},  # MADCTL

            {'cmd': 0x3A, 'data': bytes([0x55])},               # Pixel Format Set
            {'cmd': 0xB1, 'data': bytes([0x00, 0x1B])},
            {'cmd': 0xF2, 'data': bytes([0x08])},
            {'cmd': 0x26, 'data': bytes([0x01])},
            {'cmd': 0xE0, 'data': bytes([0x1F, 0x1A, 0x18, 0x0A, 0x0F, 0x06, 0x45, 0X87, 0x32, 0x0A, 0x07, 0x02, 0x07, 0x05, 0x00])},
            {'cmd': 0XE1, 'data': bytes([0x00, 0x25, 0x27, 0x05, 0x10, 0x09, 0x3A, 0x78, 0x4D, 0x05, 0x18, 0x0D, 0x38, 0x3A, 0x1F])},
            {'cmd': 0x2A, 'data': bytes([0x00, 0x00, 0x00, 0xEF])},
            {'cmd': 0x2B, 'data': bytes([0x00, 0x00, 0x01, 0x3f])},
            {'cmd': 0x2C, 'data': bytes([0])},
            {'cmd': 0xB7, 'data': bytes([0x07])},
            {'cmd': 0xB6, 'data': bytes([0x0A, 0x82, 0x27, 0x00])},
            {'cmd': 0x11, 'data': bytes([0]), 'delay':100},
            {'cmd': 0x29, 'data': bytes([0]), 'delay':100}
        ]

Those init commands there use of 5,856 bytes of ram the entire time the program is running when it only gets used a single time when the program first starts up.

Just by changing the way things are stored in those init commands the memory use changes to 4,064 bytes. A savings of 1,792 bytes.

init_cmds = (
    array.array('B', [0xCF, 0x00, 0x83, 0X30, 0]),
    array.array('B', [0xED, 0x64, 0x03, 0X12, 0X81, 0]),
    array.array('B', [0xE8, 0x85, 0x01, 0x79, 0]),
    array.array('B', [0xCB, 0x39, 0x2C, 0x00, 0x34, 0x02, 0]),
    array.array('B', [0xF7, 0x20, 0]),
    array.array('B', [0xEA, 0x00, 0x00, 0]),
    array.array('B', [0xC0, 0x26, 0]),
    array.array('B', [0xC1, 0x11, 0]),
    array.array('B', [0xC5, 0x35, 0x3E, 0]),
    array.array('B', [0xC7, 0xBE, 0]),
    array.array('B', [0x36, 0x00, 0]),
    array.array('B', [0x3A, 0x55, 0]),
    array.array('B', [0xB1, 0x00, 0x1B, 0]),
    array.array('B', [0xF2, 0x08, 0]),
    array.array('B', [0x26, 0x01, 0]),
    array.array('B', [0xE0, 0x1F, 0x1A, 0x18, 0x0A, 0x0F, 0x06, 0x45, 0X87, 0x32, 0x0A, 0x07, 0x02, 0x07, 0x05, 0x00, 0]),
    array.array('B', [0XE1, 0x00, 0x25, 0x27, 0x05, 0x10, 0x09, 0x3A, 0x78, 0x4D, 0x05, 0x18, 0x0D, 0x38, 0x3A, 0x1F, 0]),
    array.array('B', [0x2A, 0x00, 0x00, 0x00, 0xEF, 0]),
    array.array('B', [0x2B, 0x00, 0x00, 0x01, 0x3f, 0]),
    array.array('B', [0x2C, 0x00, 0]),
    array.array('B', [0xB7, 0x07, 0]),
    array.array('B', [0xB6, 0x0A, 0x82, 0x27, 0x00, 0]),
    array.array('B', [0x11, 0x00, 100]),
    array.array('B', [0x29, 0x00, 100])
)