ESPHome config - Onju Voice/Home as a voice assistant satellite in Home Assistant


Ever since voice satellites were introduced to Home Assistant, people wanted to use good microphones and speakers for this purpose, but not many were really available.

In a valiant attempt to free a Google Nest Mini (2nd generation) from its privacy ignoring overlords, Justin Alvey created Onju Voice, a drop-in replacement PCB for the Mini, with an ESP32-S3 at its heart, capable of some pretty funky stuff.

The purpose of this ESPHome config is to be able to use such a modded Nest Mini as a voice satellite in Home Assistant. Here's a small demo:


  • wake word, push to talk, on-demand and continuous conversation support
  • response playback
  • audio media player
  • service exposed in HA to start and stop the voice assistant from another device/trigger
  • visual feedback of the wake word listening/audio recording/success/error status via the Mini's onboard top LEDs
  • uses all 3 of the original Mini's touch controls as volume controls and a means of manually starting the assistant and setting the volume
  • uses the original Mini's microphone mute button to prevent the wake word engine from starting unintendedly
  • automatic continuous touch control calibration


  • Home Assistant 2023.11.3 or newer
  • A voice assistant configured in HA with STT and TTS in a language of your choice
  • ESPHome 2023.11.6 or newer

Known issues and limitations

  • you have to be able to retrofit an Onju Voice PCB inside a 2nd generation Google Nest Mini.
  • the media_player component in ESPHome does not play raw audio coming from Piper TTS. It works with any STT that outputs mp3 by default, though fixed in HA 2023.12
  • the version for microWakeWord is in BETA and probably full of bugs

Installation instructions

Here is a video explaining how to perform the PCB "transplant". You can find some instructions for disassembly here.

To flash the Onju Voice for the first time, you have to do so BEFORE YOU PUT EVERYTHING BACK TOGETHER in the Google Nest Mini housing. Otherwise, you lose access to the USB port.

So, before connecting the board for the first time, hold down the BOOT switch on it and connect a USB cable to your computer. Use the ESPHome web installer to flash according to the config below.

Double check Wifi connection details, API encryption key and device name/friendly name to make sure you use your own.

After the device has been added to ESPHome, if auto discovery is turned on, the device should appear in Home Assistant automatically. Otherwise, check out this guide.


  • obviously, a huge thanks to Justin Alvey (@justLV) for the excellent Onju Voice project
  • many thanks to Mike Hansen (@synesthesiam) for the relentless work he's put into Year of the Voice at Home Assistant
  • thanks to the ESPHome Discord server members for both creating the most time saving piece of software ever and for helping out with some kinks with the config - in particular @jesserockz, @ssieb and @Hawwa

name: "onju-voice"
friendly_name: "Onju Voice"
wifi_ap_password: ""
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2024.2.0
build_flags: "-DBOARD_HAS_PSRAM"
board_build.arduino.memory_type: qio_opi
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
board: esp32-s3-devkitc-1
type: esp-idf
mode: octal
speed: 80MHz
- service: start_va
- voice_assistant.start
- service: stop_va
- voice_assistant.stop
password: "${wifi_ap_password}"
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
- interval: 1s
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
model: okay_nabu
# model: hey_jarvis
# model: alexa
- voice_assistant.start
- platform: i2s_audio
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
id: va
microphone: onju_microphone
speaker: onju_out
use_wake_word: false
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
- delay: 500ms
- wait_until:
speaker.is_playing: onju_out
- script.execute: reset_led
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- delay: 200ms
- micro_wake_word.start
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- micro_wake_word.start:
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- voice_assistant.stop:
- micro_wake_word.stop:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
- if:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: speaker.is_playing
- speaker.stop
- if:
condition: voice_assistant.is_running
- voice_assistant.stop:
- voice_assistant.start:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: speaker.is_playing
- speaker.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
number: GPIO38
name: Disable wake word
- script.execute: turn_off_wake_word
- script.execute: turn_on_wake_word
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
- id: leds
from: 1
to: 4
default_transition_length: 100ms
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
- id: leds
from: 5
to: 5
default_transition_length: 100ms
- id: reset_led
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
- light.turn_off: top_led
- id: turn_on_wake_word
- if:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
- micro_wake_word.start
- if:
- speaker.stop:
- script.execute: reset_led
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
level: "INFO"
- id: turn_off_wake_word
- micro_wake_word.stop
- script.execute: reset_led
- id: calibrate_touch
button: int
- lambda: |-
static uint8_t thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static uint8_t qsizes[3] = {0, 0, 0};
static uint16_t consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
case 1:
case 2:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
- script.execute: turn_on_wake_word
- script.execute: turn_off_wake_word
- platform: gpio
id: dac_mute
restore_mode: ALWAYS_OFF
number: GPIO21
inverted: True
name: "onju-voice"
friendly_name: "Onju Voice"
wifi_ap_password: ""
name: ${name}
friendly_name: ${friendly_name}
name_add_mac_suffix: false
min_version: 2023.11.6
- light.turn_on:
id: top_led
effect: slow_pulse
red: 100%
green: 60%
blue: 0%
- wait_until:
- light.turn_on:
id: top_led
effect: pulse
red: 0%
green: 100%
blue: 0%
- wait_until:
- light.turn_on:
id: top_led
effect: none
red: 0%
green: 100%
blue: 0%
- delay: 1s
- script.execute: reset_led
board: esp32-s3-devkitc-1
type: arduino
- service: start_va
- voice_assistant.start
- service: stop_va
- voice_assistant.stop
password: "${wifi_ap_password}"
- id: thresh_percent
type: float
initial_value: "0.03"
restore_value: false
- id: touch_calibration_values_left
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_center
type: uint32_t[5]
restore_value: false
- id: touch_calibration_values_right
type: uint32_t[5]
restore_value: false
- interval: 1s
- script.execute:
id: calibrate_touch
button: 0
- script.execute:
id: calibrate_touch
button: 1
- script.execute:
id: calibrate_touch
button: 2
- i2s_lrclk_pin: GPIO13
i2s_bclk_pin: GPIO18
- platform: i2s_audio
name: None
id: onju_out
dac_type: external
i2s_dout_pin: GPIO12
mode: mono
number: GPIO21
inverted: True
# speaker:
# - platform: i2s_audio
# id: onju_out
# dac_type: external
# i2s_dout_pin: GPIO12
# mode: stereo
- platform: i2s_audio
id: onju_microphone
i2s_din_pin: GPIO17
adc_type: external
pdm: false
id: va
microphone: onju_microphone
media_player: onju_out
# speaker: onju_out
use_wake_word: true
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 100%
brightness: 100%
effect: listening
- light.turn_on:
id: top_led
blue: 100%
red: 0%
green: 20%
brightness: 70%
effect: processing
- media_player.play_media: !lambda return x;
- light.turn_on:
id: top_led
blue: 0%
red: 20%
green: 100%
effect: speaking
- delay: 100ms
- wait_until:
media_player.is_playing: onju_out
- script.execute: reset_led
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- voice_assistant.start_continuous:
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- voice_assistant.stop:
- light.turn_on:
id: top_led
blue: 0%
red: 100%
green: 0%
effect: none
- delay: 1s
- script.execute: reset_led
- platform: template
name: "Touch threshold percentage"
id: touch_threshold_percentage
update_interval: never
entity_category: config
initial_value: 1.25
min_value: -1
max_value: 5
step: 0.25
optimistic: true
- lambda: !lambda |-
id(thresh_percent) = 0.01 * x;
setup_mode: false
sleep_duration: 2ms
measurement_duration: 800us
low_voltage_reference: 0.8V
high_voltage_reference: 2.4V
filter_mode: IIR_16
debounce_count: 2
noise_threshold: 0
jitter_step: 0
smooth_mode: IIR_2
denoise_grade: BIT8
denoise_cap_level: L0
- platform: esp32_touch
id: volume_down
pin: GPIO4
threshold: 539000 # 533156-551132
- light.turn_on: left_led
- script.execute:
id: set_volume
volume: -0.05
- delay: 0.75s
- while:
binary_sensor.is_on: volume_down
- script.execute:
id: set_volume
volume: -0.05
- delay: 150ms
- light.turn_off: left_led
- platform: esp32_touch
id: volume_up
pin: GPIO2
threshold: 580000 # 575735-593064
- light.turn_on: right_led
- script.execute:
id: set_volume
volume: 0.05
- delay: 0.75s
- while:
binary_sensor.is_on: volume_up
- script.execute:
id: set_volume
volume: 0.05
- delay: 150ms
- light.turn_off: right_led
- platform: esp32_touch
id: action
pin: GPIO3
threshold: 751000 # 745618-767100
- if:
- switch.is_off: use_wake_word
- binary_sensor.is_on: mute_switch
- logger.log:
tag: "action_click"
format: "Voice assistant is running: %s"
args: ['id(va).is_running() ? "yes" : "no"']
- if:
condition: media_player.is_playing
- media_player.stop
- if:
condition: voice_assistant.is_running
- voice_assistant.stop:
- voice_assistant.start:
- logger.log:
tag: "action_click"
format: "Voice assistant was running with wake word detection enabled. Starting continuously"
- if:
condition: media_player.is_playing
- media_player.stop
- voice_assistant.stop
- delay: 1s
- script.execute: reset_led
- script.wait: reset_led
- voice_assistant.start_continuous:
- platform: gpio
id: mute_switch
number: GPIO38
name: Disable wake word
- script.execute: turn_off_wake_word
- script.execute: turn_on_wake_word
- platform: esp32_rmt_led_strip
id: leds
pin: GPIO11
chipset: SK6812
num_leds: 6
rgb_order: grb
rmt_channel: 0
default_transition_length: 0s
gamma_correct: 2.8
- platform: partition
id: left_led
- id: leds
from: 0
to: 0
default_transition_length: 100ms
- platform: partition
id: top_led
- id: leds
from: 1
to: 4
default_transition_length: 100ms
- pulse:
name: pulse
transition_length: 250ms
update_interval: 250ms
- pulse:
name: slow_pulse
transition_length: 1s
update_interval: 2s
- addressable_lambda:
name: show_volume
update_interval: 50ms
lambda: |-
int int_volume = int(id(onju_out).volume * 100.0f * it.size());
int full_leds = int_volume / 100;
int last_brightness = int_volume % 100;
int i = 0;
for(; i < full_leds; i++) {
it[i] = Color::WHITE;
if(i < 4) {
it[i++] = Color(0,0,0).fade_to_white(last_brightness*256/100);
for(; i < it.size(); i++) {
it[i] = Color::BLACK;
- addressable_twinkle:
name: listening_ww
twinkle_probability: 1%
- addressable_twinkle:
name: listening
twinkle_probability: 45%
- addressable_scan:
name: processing
move_interval: 80ms
- addressable_flicker:
name: speaking
intensity: 35%
- platform: partition
id: right_led
- id: leds
from: 5
to: 5
default_transition_length: 100ms
- id: reset_led
- if:
- switch.is_on: use_wake_word
- binary_sensor.is_off: mute_switch
- light.turn_on:
id: top_led
blue: 100%
red: 100%
green: 0%
brightness: 60%
effect: listening_ww
- light.turn_off: top_led
- id: set_volume
mode: restart
volume: float
- media_player.volume_set:
id: onju_out
volume: !lambda return clamp(id(onju_out).volume+volume, 0.0f, 1.0f);
- light.turn_on:
id: top_led
effect: show_volume
- delay: 1s
- script.execute: reset_led
- id: turn_on_wake_word
- if:
- binary_sensor.is_off: mute_switch
- switch.is_on: use_wake_word
- lambda: id(va).set_use_wake_word(true);
- if:
- media_player.stop:
- if:
- voice_assistant.is_running
- voice_assistant.start_continuous
- script.execute: reset_led
- logger.log:
tag: "turn_on_wake_word"
format: "Trying to start listening for wake word, but %s"
'id(mute_switch).state ? "mute switch is on" : "use wake word toggle is off"',
level: "INFO"
- id: turn_off_wake_word
- voice_assistant.stop
- lambda: id(va).set_use_wake_word(false);
- script.execute: reset_led
- id: calibrate_touch
button: int
- lambda: |-
static byte thresh_indices[3] = {0, 0, 0};
static uint32_t sums[3] = {0, 0, 0};
static byte qsizes[3] = {0, 0, 0};
static int consecutive_anomalies_per_button[3] = {0, 0, 0};
uint32_t newval;
uint32_t* calibration_values;
switch(button) {
case 0:
newval = id(volume_down).get_value();
calibration_values = id(touch_calibration_values_left);
case 1:
newval = id(action).get_value();
calibration_values = id(touch_calibration_values_center);
case 2:
newval = id(volume_up).get_value();
calibration_values = id(touch_calibration_values_right);
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
if(newval == 0) return;
//ESP_LOGD("touch_calibration", "[%d] qsize %d, sum %d, thresh_index %d, consecutive_anomalies %d", button, qsizes[button], sums[button], thresh_indices[button], consecutive_anomalies_per_button[button]);
//ESP_LOGD("touch_calibration", "[%d] New value is %d", button, newval);
if(qsizes[button] == 5) {
float avg = float(sums[button])/float(qsizes[button]);
if((fabs(float(newval)-avg)/avg) > id(thresh_percent)) {
//ESP_LOGD("touch_calibration", "[%d] %d anomalies detected.", button, consecutive_anomalies_per_button[button]);
if(consecutive_anomalies_per_button[button] < 10)
//ESP_LOGD("touch_calibration", "[%d] Resetting consecutive anomalies counter.", button);
consecutive_anomalies_per_button[button] = 0;
if(qsizes[button] == 5) {
//ESP_LOGD("touch_calibration", "[%d] Queue full, removing %d.", button, id(touch_calibration_values)[thresh_indices[button]]);
sums[button] -= (uint32_t) *(calibration_values+thresh_indices[button]);// id(touch_calibration_values)[thresh_indices[button]];
*(calibration_values+thresh_indices[button]) = newval;
sums[button] += newval;
thresh_indices[button] = (thresh_indices[button] + 1) % 5;
//ESP_LOGD("touch_calibration", "[%d] Average value is %d", button, sums[button]/qsizes[button]);
uint32_t newthresh = uint32_t((sums[button]/qsizes[button]) * (1.0 + id(thresh_percent)));
//ESP_LOGD("touch_calibration", "[%d] Setting threshold %d", button, newthresh);
switch(button) {
case 0:
case 1:
case 2:
ESP_LOGE("touch_calibration", "Invalid button ID (%d)", button);
- platform: template
name: Use Wake Word
id: use_wake_word
optimistic: true
restore_mode: RESTORE_DEFAULT_ON
- script.execute: turn_on_wake_word
- script.execute: turn_off_wake_word
