Skip to content

Instantly share code, notes, and snippets.

@niradler
Created May 16, 2026 00:32
Show Gist options
  • Select an option

  • Save niradler/deb302538f05f1608f44ca370d31591a to your computer and use it in GitHub Desktop.

Select an option

Save niradler/deb302538f05f1608f44ca370d31591a to your computer and use it in GitHub Desktop.
Xiaozhi Ai Robot Deepseek Conversational Voice Dialogue PCBA Development Intelligent Electronic Toy For Fun And Learning

Generic AliExpress ESP32-S3 1.83-inch 2-Mic Board Spec

Hardware reference for a generic AliExpress ESP32-S3 board sold with a 1.83-inch LCD, two microphones, speaker output, three buttons, and one RGB LED. The board is commonly identified in firmware examples as xingzhi-cube 1.83 2mic.

Use this as the firmware integration baseline, but verify every pin against the exact board revision before ordering PCBs or making irreversible hardware assumptions.

Board Summary

Item Spec
SoC ESP32-S3, dual core, 240 MHz
Flash 16 MB octal flash
PSRAM 8 MB octal PSRAM
Display 1.83-inch SPI LCD, 284x240, ST7789V-like controller with vendor init
Speaker codec ES8311 DAC, I2C address 0x18
Microphone codec ES7210 ADC, I2C address 0x40, 4-channel device with 2 mics connected
Buttons 3 active-low buttons
RGB LED WS2812-compatible LED
Speaker PA GPIO-controlled amplifier enable

Pin Map

Function GPIO Direction Notes
I2C SDA 12 bidirectional Shared by ES8311 and ES7210
I2C SCL 11 output 400 kHz recommended
I2S MCLK 5 output Shared TX/RX, 256x sample-rate multiplier
I2S BCLK 15 output Shared TX/RX
I2S LRCLK / WS 16 output Shared TX/RX
I2S DOUT 6 output ESP32-S3 to ES8311 speaker DAC
I2S DIN 7 input ES7210 mic ADC to ESP32-S3
Speaker PA enable 4 output Active high; must be high for speaker output
LCD SPI CLK 9 output SPI2_HOST, 40 MHz tested
LCD SPI MOSI 10 output Display data
LCD CS 14 output Display chip select
LCD DC 8 output Display command/data select
LCD reset 18 output Active low hardware reset
LCD backlight 13 output LEDC PWM, timer 0, channel 0
Button WAKE 0 input Top button, active low; tracks press and release
Button MUTE 39 input Left button, active low
Button VOLUME 40 input Right button, active low
WS2812 LED 48 output Single addressable RGB LED

Audio Hardware

Codec Topology

  • ES8311 handles speaker playback.
  • ES7210 handles microphone input.
  • Both codecs share the same I2C bus.
  • I2S TX and RX share MCLK, BCLK, and LRCLK.
  • Speaker output is silent unless the PA enable pin is driven high.

ESP-IDF Components

Required components:

dependencies:
  espressif/esp_codec_dev: "^1.3.0"

Required ESP-IDF drivers:

esp_driver_i2c
esp_driver_i2s
esp_driver_gpio

Audio Init Order

The order matters because esp_codec_dev_new() needs a valid data interface, and the data interface needs an I2S channel handle.

  1. Configure speaker PA GPIO as output and start low.
  2. Initialize I2C master bus on SDA GPIO12 and SCL GPIO11 at 400 kHz.
  3. Probe codec I2C addresses 0x18 and 0x40.
  4. Create I2S TX and RX channel pair on the same I2S port.
  5. Configure TX for standard Philips mode, 16 kHz, 16-bit, stereo.
  6. Configure RX for TDM mode with 4 slots for ES7210.
  7. Use MCLK multiple 256 for a 4.096 MHz MCLK at 16 kHz.
  8. Use DMA settings of 6 descriptors and 240 frames per descriptor.
  9. Create TX and RX I2S data interfaces with audio_codec_new_i2s_data().
  10. Create codec control interfaces over I2C.
  11. Create ES8311 codec device using the TX data interface.
  12. Create ES7210 codec device using the RX data interface.
  13. Call esp_codec_dev_open() for both codec devices.
  14. Set output volume, tested comfortable value is 60 out of 100.

After esp_codec_dev_open(), the I2S channel is already enabled. Do not call i2s_channel_enable() again.

Playback Notes

Use esp_codec_dev_write() instead of raw i2s_channel_write(). The codec abstraction handles codec format setup, volume, and I2S interaction.

Mono playback buffers must be expanded to stereo interleaved 16-bit samples before writing to the ES8311 path.

esp_codec_dev_handle_t output = board_audio_get_output();

board_speaker_pa_set(true);
esp_codec_dev_write(output, stereo_samples, stereo_bytes);
board_speaker_pa_set(false);

Suggested initial settings:

Setting Value
Sample rate 16000 Hz
Bits per sample 16
Playback channels 2 interleaved
Codec volume 60 / 100
Test tone amplitude 10000 / 32767

Recording Notes

Use esp_codec_dev_read() instead of raw i2s_channel_read().

The ES7210 is a 4-channel TDM codec. On this board only two microphones are populated, and firmware typically reads stereo interleaved samples then extracts one channel for mono speech input.

esp_codec_dev_handle_t input = board_audio_get_input();

esp_codec_dev_read(input, stereo_buf, stereo_bytes);

for (int i = 0; i < frame_count; i++) {
    mono[i] = stereo_buf[i * 2];
}

Display Hardware

Display Summary

Item Value
Interface SPI
SPI host SPI2_HOST
Clock 40 MHz tested
Resolution 284x240
Pixel format RGB565
Backlight GPIO13 LEDC PWM
Reset GPIO18 active low
Controller ST7789V-like with vendor-specific init

Important LCD Behavior

This panel is not initialized correctly by a plain esp_lcd_new_panel_st7789() flow. It needs a vendor-specific command sequence before the ST7789 panel driver is used. Without that sequence, the screen can stay blank.

The known-good init sequence was recovered from an ESPHome configuration for the xingzhi-cube board family. Keep that command table with the display driver source, not only in application code.

Display Init Order

  1. Initialize LEDC PWM for backlight on GPIO13.
  2. Initialize SPI bus on SPI2_HOST with CLK GPIO9 and MOSI GPIO10.
  3. Create panel IO with CS GPIO14 and DC GPIO8.
  4. Hardware-reset the panel: reset low for 20 ms, then high for 120 ms.
  5. Send the vendor-specific LCD init command table with esp_lcd_panel_io_tx_param().
  6. Create the ST7789 panel driver with reset GPIO disabled, because hardware reset already ran.
  7. Call esp_lcd_panel_init().
  8. Enable color inversion.
  9. Set display gap to x=36, y=0.
  10. Apply orientation: swap_xy=true, mirror_x=false, mirror_y=true.
  11. Set backlight to a comfortable default, tested at 80 percent.

Display Constants

#define BOARD_LCD_WIDTH        284
#define BOARD_LCD_HEIGHT       240
#define BOARD_LCD_SPI_HOST     SPI2_HOST
#define BOARD_LCD_SPI_HZ       (40 * 1000 * 1000)
#define BOARD_LCD_GAP_X        36
#define BOARD_LCD_GAP_Y        0

RGB565 values are byte-swapped for big-endian SPI writes:

Color Value
Black 0x0000
White 0xFFFF
Red 0x00F8
Green 0xE007
Blue 0x1F00
Yellow 0xE0FF

Required ESP-IDF drivers:

esp_lcd
esp_driver_ledc

Buttons

All buttons are active low. Enable internal pull-ups.

Button GPIO Behavior
WAKE 0 Track press and release; suitable for push-to-talk
MUTE 39 Press event is usually enough
VOLUME 40 Press event is usually enough

Recommended architecture:

GPIO ISR on any edge
  -> FreeRTOS queue
  -> debounce task, 50 ms window
  -> application callback

Use a 4096 byte stack for the debounce task if it logs with ESP_LOGI(). A 2048 byte stack can overflow when logging from the task.

RGB LED

The board has a WS2812-compatible addressable LED on GPIO48. Use an RMT-backed LED strip driver where possible.

Suggested Firmware Defines

#define BOARD_I2C_SDA_GPIO             12
#define BOARD_I2C_SCL_GPIO             11

#define BOARD_I2S_MCLK_GPIO             5
#define BOARD_I2S_BCLK_GPIO            15
#define BOARD_I2S_WS_GPIO              16
#define BOARD_I2S_DOUT_GPIO             6
#define BOARD_I2S_DIN_GPIO              7

#define BOARD_SPEAKER_PA_GPIO           4

#define BOARD_LCD_SPI_CLK_GPIO          9
#define BOARD_LCD_SPI_MOSI_GPIO        10
#define BOARD_LCD_CS_GPIO              14
#define BOARD_LCD_DC_GPIO               8
#define BOARD_LCD_RST_GPIO             18
#define BOARD_LCD_BACKLIGHT_GPIO       13

#define BOARD_BUTTON_WAKE_GPIO          0
#define BOARD_BUTTON_MUTE_GPIO         39
#define BOARD_BUTTON_VOLUME_GPIO       40

#define BOARD_WS2812_GPIO              48

#define BOARD_AUDIO_SAMPLE_RATE     16000
#define BOARD_AUDIO_BITS               16
#define BOARD_AUDIO_MONO_CHANNELS       1
#define BOARD_I2S_DMA_DESC_NUM          6
#define BOARD_I2S_DMA_FRAME_NUM       240

#define BOARD_LCD_WIDTH               284
#define BOARD_LCD_HEIGHT              240

Recommended App Init Order

board_audio_init();      // I2C, I2S, ES8311, ES7210
board_display_init();    // SPI LCD, vendor init, backlight
board_buttons_init(cb);  // GPIO buttons and debounce task
board_led_init();        // WS2812/RMT

Validation Checklist

  1. I2C scan finds ES8311 at 0x18 and ES7210 at 0x40.
  2. Speaker PA GPIO4 toggles high during playback.
  3. A 1 kHz sine tone plays at comfortable volume.
  4. Microphone capture returns non-zero samples while speaking.
  5. Captured stereo/TDM data can be reduced to usable mono from channel 0.
  6. LCD lights, exits reset, and displays a solid color after vendor init.
  7. LCD orientation is correct with swap_xy=true, mirror_y=true, and gap_x=36.
  8. Backlight PWM changes visible brightness.
  9. All three buttons report active-low presses.
  10. WAKE button reports both press and release.
  11. WS2812 on GPIO48 can show red, green, and blue.

Known Integration Risks

Area Risk Mitigation
LCD Plain ST7789 init gives a blank screen Send vendor command table before normal panel init
Audio Creating codec before I2S data interface fails Create I2S channels and data interfaces first
Audio Calling i2s_channel_enable() after codec open returns an error Let esp_codec_dev_open() manage channel enable
Speaker Valid I2S data but no sound Drive speaker PA GPIO4 high
Mic ES7210 data shape is not mono Read interleaved/TDM data and extract one channel
Buttons Debounce task stack overflow Use 4096 byte task stack if logging
Board revisions AliExpress listings may reuse names across revisions Verify pins, display controller, and codec addresses on the actual board

Optional Voice Client Notes

For speech-to-text, the board has enough PSRAM to buffer a short 16 kHz, 16-bit mono recording, prepend a 44 byte WAV header, and upload it as multipart form data to a LAN transcription server.

For text-to-speech, reserve a PSRAM response buffer. A 512 KB buffer is a reasonable starting point for short WAV responses. Validate the returned sample rate; some TTS servers return 22050 Hz audio, which will play at the wrong speed if the I2S clock remains fixed at 16 kHz.

Approximate PSRAM budget:

Buffer Size
10 second STT recording, 16 kHz mono PCM 320 KB
TTS response buffer 512 KB
LCD DMA row buffer 1 KB
Total 833 KB
@niradler

niradler commented May 16, 2026

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment