QSPI to parallel converter with small FPGA

Hello.

I done proof-of-concept to use QSPI to feed display data in parallel with small FPGA converter.

I tested support of ILI9481 with IPS display (~ $7, 3.5" 480x320px) with standard SPI access “ESP32 -> { MOSI, SCLK, CS, D/C, RESET } -> ILI9481 (DBI Type C 8-bit)” but it works only to 16 MHz (~2MB/s) and benchmark is 6FPS :roll_eyes:.

So, I took some FPGA and did some converter “ESP32 -> { SCLK, CS, 4x DATA } -> FPGA -> { 16x DATA, CS, D/C, RESET } -> ILI9481 (DBI Type B 16-bit)”. QSPI is running @ 40 MHz (~20MB/s) and parallel interface is running @ 10 MHz (compliant with datasheet without overclocking). Much better now - benchmark is 53FPS :smiley: !
Benchmark (CONFIG_ESP32_DEFAULT_CPU_FREQ_MHZ=240, CONFIG_COMPILER_OPTIMIZATION_PERF=y, CONFIG_FREERTOS_HZ=1000):

Weighted FPS: 53
Opa. speed: 48%

Rectangle [w: 30]: 377 opa: 15  
Rectangle rounded [w: 20]: 94 opa: 13  
Circle [w: 10]: 48 opa: 12  
Border [w: 20]: 113 opa: 97  
Border rounded [w: 30]: 97 opa: 91  
Circle border [w: 10]: 28 opa: 24  
Border top [w: 3]: 97 opa: 97  
Border left [w: 3]: 70 opa: 69  
Border top + left [w: 3]: 49 opa: 29  
Border left + right [w: 3]: 48 opa: 29  
Border top + bottom [w: 3]: 97 opa: 94  
Shadow small [w: 3]: 19 opa: 18  
Shadow small offset [w: 5]: 12 opa: 11  
Shadow large [w: 5]: 7 opa: 7  
Shadow large offset [w: 3]: 5 opa: 5  
Image RGB [w: 20]: 94 opa: 47  
Image ARGB [w: 20]: 29 opa: 24  
Image chorma keyed [w: 5]: 45 opa: 26  
Image indexed [w: 5]: 29 opa: 23  
Image alpha only [w: 5]: 48 opa: 28  
Image RGB recolor [w: 5]: 26 opa: 19  
Image ARGB recolor [w: 20]: 21 opa: 19  
Image chorma keyed recolor [w: 3]: 29 opa: 23  
Image indexed recolor [w: 3]: 27 opa: 20  
Image RGB rotate [w: 3]: 20 opa: 14  
Image RGB rotate anti aliased [w: 3]: 8 opa: 7  
Image ARGB rotate [w: 5]: 15 opa: 13  
Image ARGB rotate anti aliased [w: 5]: 7 opa: 7  
Image RGB zoom [w: 3]: 20 opa: 16  
Image RGB zoom anti aliased [w: 3]: 8 opa: 7  
Image ARGB zoom [w: 5]: 15 opa: 13  
Image ARGB zoom anti aliased [w: 5]: 7 opa: 7  
Text small [w: 20]: 15 opa: 15 *
Text medium [w: 30]: 17 opa: 15 *
Text large [w: 20]: 16 opa: 15 *
Text small compressed [w: 3]: 13 opa: 12  
Text medium compressed [w: 5]: 9 opa: 8  
Text large compressed [w: 10]: 4 opa: 4 *
Line [w: 10]: 47 opa: 30  
Arc think [w: 10]: 34 opa: 28  
Arc thick [w: 10]: 33 opa: 28  
Substr. rectangle [w: 10]: 22 opa: 22  
Substr. border [w: 10]: 94 opa: 97  
Substr. shadow [w: 10]: 9 opa: 9 *
Substr. image [w: 10]: 25 opa: 22  
Substr. line [w: 10]: 47 opa: 47  
Substr. arc [w: 10]: 29 opa: 29  
Substr. text [w: 10]: 14 opa: 14 *

I am trying to optimize project for:

  • ICE40LP384-SG32 (~ $1.5 with external latch and fixed functionality)
  • ICE5LP1K-SG48 (~ $3 more programmable features including in/out free pins)
  • ICE40HX1K-VQ100 (~ $3.9 more free pins)
  • iCE40HX1K-TQ144 (~ $4.4 so many free pins)

Testing:

QSPI/parallel datastream with LogicSniffer:

FPGA with ICEstudio:

3 Likes

Sounds like a great result!

As far as I understand it, you’re using the FPGA only as a converter from QSPI to 16bit 8080 interface, right? Wouldn’t it be possible to achieve the same with serial-in-parallel-out shift registers, instead of an FPGA?

Not exactly. For QSPI (4bit) at least 3x 4bit D-latch, shift register to select clk to latch, glue logic to cycle shift register and generate WRCLK, from ESP32 need more software signaling (CS,DC and new data8 vs. data16 latching (for color xfer)) … try to build it :slight_smile:.
There are also some challenges:

  • setup/hold times for data versus clock and additional glue logic
  • you must change QSPI speed (16bit color data vs. 8bit paramaters xfers) to respect datasheet
  • PCB place

I added programmable maximum WRCLK rate (specific SPI command) with FIFO buffer, also programmable data xfer command automatic detection (ILI9481_CMD_MEMORY_WRITE, ILI9481_CMD_WRITE_MEMORY_CONTINUE) and prepared for HW color conversions (byte swapping, RGB888->RB565, XRGB8888>RB565, …)…

Example for FPGA (ILI9481_CMD_GAMMA_SETTING) - by datasheet “twc - Write cycle - 100ns” => 10MHz:

But also tested (12.5MHz, 16.6MHz and maximum 20MHz (limited by QSPI data clock rate 40MHz)).
It works (for parameters) but outside specification.

The simplest solution can be achieved with 8-bit only xfers (DBI Type B 8-bit) I think that it will be 5x D-flop needed. But limited QSPI to 20MHz only (10MB/s). I will try sometimes.

Hello mcerveny,

what QSPI device did you use?
I know apmemory have some QSPI memories. Are there any other manufacturer?

It is FPGA not memory chip. FPGA is programmable chip do everything you want and very fast. I use this as QSPI -> 8080parallel smart bridge for LCD panel in this case. FPGA usually have some small embedded memories (I use this for FIFO queue between different clock domains).

If you query for QSPI PSRAM - there are few manufacturers. ESP32 supported.