Hardware reference for a generic AliExpress ESP32-S3 board sold with a 1.83-inch LCD, two microphones, speaker output, three buttons, and one RGB LED. The board is commonly identified in firmware examples as xingzhi-cube 1.83 2mic.
Use this as the firmware integration baseline, but verify every pin against the exact board revision before ordering PCBs or making irreversible hardware assumptions.
| Item | Spec |
|---|---|
| SoC | ESP32-S3, dual core, 240 MHz |
| Flash | 16 MB octal flash |
| PSRAM | 8 MB octal PSRAM |
| Display | 1.83-inch SPI LCD, 284x240, ST7789V-like controller with vendor init |
| Speaker codec | ES8311 DAC, I2C address 0x18 |
| Microphone codec | ES7210 ADC, I2C address 0x40, 4-channel device with 2 mics connected |
| Buttons | 3 active-low buttons |
| RGB LED | WS2812-compatible LED |
| Speaker PA | GPIO-controlled amplifier enable |
| Function | GPIO | Direction | Notes |
|---|---|---|---|
| I2C SDA | 12 | bidirectional | Shared by ES8311 and ES7210 |
| I2C SCL | 11 | output | 400 kHz recommended |
| I2S MCLK | 5 | output | Shared TX/RX, 256x sample-rate multiplier |
| I2S BCLK | 15 | output | Shared TX/RX |
| I2S LRCLK / WS | 16 | output | Shared TX/RX |
| I2S DOUT | 6 | output | ESP32-S3 to ES8311 speaker DAC |
| I2S DIN | 7 | input | ES7210 mic ADC to ESP32-S3 |
| Speaker PA enable | 4 | output | Active high; must be high for speaker output |
| LCD SPI CLK | 9 | output | SPI2_HOST, 40 MHz tested |
| LCD SPI MOSI | 10 | output | Display data |
| LCD CS | 14 | output | Display chip select |
| LCD DC | 8 | output | Display command/data select |
| LCD reset | 18 | output | Active low hardware reset |
| LCD backlight | 13 | output | LEDC PWM, timer 0, channel 0 |
| Button WAKE | 0 | input | Top button, active low; tracks press and release |
| Button MUTE | 39 | input | Left button, active low |
| Button VOLUME | 40 | input | Right button, active low |
| WS2812 LED | 48 | output | Single addressable RGB LED |
- ES8311 handles speaker playback.
- ES7210 handles microphone input.
- Both codecs share the same I2C bus.
- I2S TX and RX share MCLK, BCLK, and LRCLK.
- Speaker output is silent unless the PA enable pin is driven high.
Required components:
dependencies:
espressif/esp_codec_dev: "^1.3.0"Required ESP-IDF drivers:
esp_driver_i2c
esp_driver_i2s
esp_driver_gpio
The order matters because esp_codec_dev_new() needs a valid data interface, and the data interface needs an I2S channel handle.
- Configure speaker PA GPIO as output and start low.
- Initialize I2C master bus on SDA GPIO12 and SCL GPIO11 at 400 kHz.
- Probe codec I2C addresses
0x18and0x40. - Create I2S TX and RX channel pair on the same I2S port.
- Configure TX for standard Philips mode, 16 kHz, 16-bit, stereo.
- Configure RX for TDM mode with 4 slots for ES7210.
- Use MCLK multiple 256 for a 4.096 MHz MCLK at 16 kHz.
- Use DMA settings of 6 descriptors and 240 frames per descriptor.
- Create TX and RX I2S data interfaces with
audio_codec_new_i2s_data(). - Create codec control interfaces over I2C.
- Create ES8311 codec device using the TX data interface.
- Create ES7210 codec device using the RX data interface.
- Call
esp_codec_dev_open()for both codec devices. - Set output volume, tested comfortable value is 60 out of 100.
After esp_codec_dev_open(), the I2S channel is already enabled. Do not call i2s_channel_enable() again.
Use esp_codec_dev_write() instead of raw i2s_channel_write(). The codec abstraction handles codec format setup, volume, and I2S interaction.
Mono playback buffers must be expanded to stereo interleaved 16-bit samples before writing to the ES8311 path.
esp_codec_dev_handle_t output = board_audio_get_output();
board_speaker_pa_set(true);
esp_codec_dev_write(output, stereo_samples, stereo_bytes);
board_speaker_pa_set(false);Suggested initial settings:
| Setting | Value |
|---|---|
| Sample rate | 16000 Hz |
| Bits per sample | 16 |
| Playback channels | 2 interleaved |
| Codec volume | 60 / 100 |
| Test tone amplitude | 10000 / 32767 |
Use esp_codec_dev_read() instead of raw i2s_channel_read().
The ES7210 is a 4-channel TDM codec. On this board only two microphones are populated, and firmware typically reads stereo interleaved samples then extracts one channel for mono speech input.
esp_codec_dev_handle_t input = board_audio_get_input();
esp_codec_dev_read(input, stereo_buf, stereo_bytes);
for (int i = 0; i < frame_count; i++) {
mono[i] = stereo_buf[i * 2];
}| Item | Value |
|---|---|
| Interface | SPI |
| SPI host | SPI2_HOST |
| Clock | 40 MHz tested |
| Resolution | 284x240 |
| Pixel format | RGB565 |
| Backlight | GPIO13 LEDC PWM |
| Reset | GPIO18 active low |
| Controller | ST7789V-like with vendor-specific init |
This panel is not initialized correctly by a plain esp_lcd_new_panel_st7789() flow. It needs a vendor-specific command sequence before the ST7789 panel driver is used. Without that sequence, the screen can stay blank.
The known-good init sequence was recovered from an ESPHome configuration for the xingzhi-cube board family. Keep that command table with the display driver source, not only in application code.
- Initialize LEDC PWM for backlight on GPIO13.
- Initialize SPI bus on SPI2_HOST with CLK GPIO9 and MOSI GPIO10.
- Create panel IO with CS GPIO14 and DC GPIO8.
- Hardware-reset the panel: reset low for 20 ms, then high for 120 ms.
- Send the vendor-specific LCD init command table with
esp_lcd_panel_io_tx_param(). - Create the ST7789 panel driver with reset GPIO disabled, because hardware reset already ran.
- Call
esp_lcd_panel_init(). - Enable color inversion.
- Set display gap to x=36, y=0.
- Apply orientation:
swap_xy=true,mirror_x=false,mirror_y=true. - Set backlight to a comfortable default, tested at 80 percent.
#define BOARD_LCD_WIDTH 284
#define BOARD_LCD_HEIGHT 240
#define BOARD_LCD_SPI_HOST SPI2_HOST
#define BOARD_LCD_SPI_HZ (40 * 1000 * 1000)
#define BOARD_LCD_GAP_X 36
#define BOARD_LCD_GAP_Y 0RGB565 values are byte-swapped for big-endian SPI writes:
| Color | Value |
|---|---|
| Black | 0x0000 |
| White | 0xFFFF |
| Red | 0x00F8 |
| Green | 0xE007 |
| Blue | 0x1F00 |
| Yellow | 0xE0FF |
Required ESP-IDF drivers:
esp_lcd
esp_driver_ledc
All buttons are active low. Enable internal pull-ups.
| Button | GPIO | Behavior |
|---|---|---|
| WAKE | 0 | Track press and release; suitable for push-to-talk |
| MUTE | 39 | Press event is usually enough |
| VOLUME | 40 | Press event is usually enough |
Recommended architecture:
GPIO ISR on any edge
-> FreeRTOS queue
-> debounce task, 50 ms window
-> application callback
Use a 4096 byte stack for the debounce task if it logs with ESP_LOGI(). A 2048 byte stack can overflow when logging from the task.
The board has a WS2812-compatible addressable LED on GPIO48. Use an RMT-backed LED strip driver where possible.
#define BOARD_I2C_SDA_GPIO 12
#define BOARD_I2C_SCL_GPIO 11
#define BOARD_I2S_MCLK_GPIO 5
#define BOARD_I2S_BCLK_GPIO 15
#define BOARD_I2S_WS_GPIO 16
#define BOARD_I2S_DOUT_GPIO 6
#define BOARD_I2S_DIN_GPIO 7
#define BOARD_SPEAKER_PA_GPIO 4
#define BOARD_LCD_SPI_CLK_GPIO 9
#define BOARD_LCD_SPI_MOSI_GPIO 10
#define BOARD_LCD_CS_GPIO 14
#define BOARD_LCD_DC_GPIO 8
#define BOARD_LCD_RST_GPIO 18
#define BOARD_LCD_BACKLIGHT_GPIO 13
#define BOARD_BUTTON_WAKE_GPIO 0
#define BOARD_BUTTON_MUTE_GPIO 39
#define BOARD_BUTTON_VOLUME_GPIO 40
#define BOARD_WS2812_GPIO 48
#define BOARD_AUDIO_SAMPLE_RATE 16000
#define BOARD_AUDIO_BITS 16
#define BOARD_AUDIO_MONO_CHANNELS 1
#define BOARD_I2S_DMA_DESC_NUM 6
#define BOARD_I2S_DMA_FRAME_NUM 240
#define BOARD_LCD_WIDTH 284
#define BOARD_LCD_HEIGHT 240board_audio_init(); // I2C, I2S, ES8311, ES7210
board_display_init(); // SPI LCD, vendor init, backlight
board_buttons_init(cb); // GPIO buttons and debounce task
board_led_init(); // WS2812/RMT- I2C scan finds ES8311 at
0x18and ES7210 at0x40. - Speaker PA GPIO4 toggles high during playback.
- A 1 kHz sine tone plays at comfortable volume.
- Microphone capture returns non-zero samples while speaking.
- Captured stereo/TDM data can be reduced to usable mono from channel 0.
- LCD lights, exits reset, and displays a solid color after vendor init.
- LCD orientation is correct with
swap_xy=true,mirror_y=true, andgap_x=36. - Backlight PWM changes visible brightness.
- All three buttons report active-low presses.
- WAKE button reports both press and release.
- WS2812 on GPIO48 can show red, green, and blue.
| Area | Risk | Mitigation |
|---|---|---|
| LCD | Plain ST7789 init gives a blank screen | Send vendor command table before normal panel init |
| Audio | Creating codec before I2S data interface fails | Create I2S channels and data interfaces first |
| Audio | Calling i2s_channel_enable() after codec open returns an error |
Let esp_codec_dev_open() manage channel enable |
| Speaker | Valid I2S data but no sound | Drive speaker PA GPIO4 high |
| Mic | ES7210 data shape is not mono | Read interleaved/TDM data and extract one channel |
| Buttons | Debounce task stack overflow | Use 4096 byte task stack if logging |
| Board revisions | AliExpress listings may reuse names across revisions | Verify pins, display controller, and codec addresses on the actual board |
For speech-to-text, the board has enough PSRAM to buffer a short 16 kHz, 16-bit mono recording, prepend a 44 byte WAV header, and upload it as multipart form data to a LAN transcription server.
For text-to-speech, reserve a PSRAM response buffer. A 512 KB buffer is a reasonable starting point for short WAV responses. Validate the returned sample rate; some TTS servers return 22050 Hz audio, which will play at the wrong speed if the I2S clock remains fixed at 16 kHz.
Approximate PSRAM budget:
| Buffer | Size |
|---|---|
| 10 second STT recording, 16 kHz mono PCM | 320 KB |
| TTS response buffer | 512 KB |
| LCD DMA row buffer | 1 KB |
| Total | 833 KB |
https://aliexpress.com/item/1005010442374066.html?spm=a2g0o.order_list.order_list_main.32.6645180249Waj2&gatewayAdapt=glo2isr