Skip to content

Instantly share code, notes, and snippets.

@weirded
Created April 24, 2026 02:13
Show Gist options
  • Select an option

  • Save weirded/478a65689870e37fc6bd47e9e5a099e5 to your computer and use it in GitHub Desktop.

Select an option

Save weirded/478a65689870e37fc6bd47e9e5a099e5 to your computer and use it in GitHub Desktop.
ESPHome Fleet — install the 1.6.2 dev build on Home Assistant OS

Install the ESPHome Fleet 1.6.2 dev build on Home Assistant OS

The fix for the 100% CPU hang you hit is on the develop branch of the ESPHome Fleet repo. 1.6.2 hasn't cut a stable release yet, but every commit on develop publishes a signed Docker image to GitHub Container Registry that Home Assistant Supervisor can install directly. These steps swap your add-on over to that dev channel.

What's changing

  • You'll add a second add-on repository to Home Assistant, pointing at the develop branch. The add-on you have now came from the same GitHub repo's main branch.
  • Supervisor will pull a fresh prebuilt image from GHCR (no local build). The install is a download, not a compile — on a decent connection it takes under a minute.
  • Your existing add-on data — job queue, settings, device cache, git-versioned config history, scheduled jobs — is keyed by the add-on's slug, which is identical between the stable and dev channels. It persists across the swap.
  • When 1.6.2 ships stable you can switch back, or stay on dev permanently if you like the faster release cadence. Your choice.

Before you start

Take a Home Assistant backup. Settings → System → Backups → Create Backup → Partial. Check Home Assistant Core and ESPHome Fleet. This gives you a one-click rollback.

Confirm the current version. Open ESPHome Fleet from the Home Assistant sidebar. The version number shows in the footer. It should read 1.6.1.

Steps

1. Add the dev repository

  1. Settings → Add-onsAdd-on Store.

  2. Top-right three-dot menu → Repositories.

  3. Paste this URL into the box:

    https://github.com/weirded/distributed-esphome#develop
    

    The #develop on the end is important — without it Supervisor will read the stable branch.

  4. Click Add, then Close.

2. Refresh the store

Back in the Add-on Store, three-dot menu → Check for updates. Give it 10–30 seconds. You'll see two "ESPHome Fleet" cards now — one from the stable repo you already had, one from the new dev repo you just added.

3. Switch to the dev version

  1. Click the stable ESPHome Fleet card (the one you were using) → Uninstall. Confirm. Supervisor keeps your add-on data — only the container image is removed.
  2. Click the dev ESPHome Fleet card (from #develop) → Install. The image pulls from ghcr.io/weirded/... — expect 30–60 s on a reasonable connection.
  3. Start the add-on.
  4. Open the Web UI from the sidebar.

4. Verify

The footer should show 1.6.2-dev.30 (or a higher dev.N if more commits landed between now and when you install). Your existing targets, job history, pinned versions, schedules, and settings should all be intact.

Open the Queue tab, kick off a compile for one of your devices, and watch the add-on's CPU usage in Home Assistant's Settings → System → Hardware page (or docker stats if you have host shell access). The 100% CPU spike you were seeing during every worker poll is gone.

If something goes wrong

Two ways back:

  • Restore from the backup you took in step 0. Settings → System → Backups → your backup → Restore → ESPHome Fleet.
  • Manual roll-back. Uninstall the dev version. If you removed the stable repository URL at any point, re-add https://github.com/weirded/distributed-esphome (no #develop suffix). Install the stable card. Your data is still there.

What this actually fixed (so you know what to expect)

In 1.6.1, every time a build worker polled the add-on for a new job (default every 5 seconds), the add-on tar+gzipped the entire /config/esphome/ directory tree synchronously on its main thread — whole common/ package tree, anything under components/, and anything else in that dir, even files the target didn't reference. On a config the size of yours (49 top-level YAMLs plus the common/ package tree) on a typical HA host that takes several seconds of 100% CPU per bundle. While that's running, the add-on's UI freezes, device-status polls stall, and the worker's next 5-second poll queues up behind it — and can trigger another bundle once the first finishes. That's the permanent 100% CPU loop you were seeing.

1.6.2 does two things:

  1. Bundle creation runs off the main thread, so the UI and status polls stay responsive.
  2. The bundle now ships only the files the specific target actually references. Much less data to compress, and — as a side benefit — remote workers no longer receive unrelated devices' YAMLs or secrets, or your repo's .git/ directory.

One other thing worth fixing on your side

While reading your logs I noticed a pattern that'll bite you in subtle ways once you're on 1.6.2.

Several of your common/*.yaml package files (wifi.yaml, adafruit_soil_sensor.yaml, boolean_or.yaml, soil_dry_indicator.yaml) declare fallback substitution values under a top-level defaults: key, like:

# common/wifi.yaml
substitutions:
  dns_domain: sf.aberrant.org

defaults:
  fqdn: ${hostname}.${dns_domain}

wifi:
  use_address: ${fqdn}
  ...

ESPHome's defaults: mechanism only activates when the caller uses !include with a vars: block. Most of your top-level YAMLs use bare !include common/X.yaml (no vars:), so those defaults: blocks are silently ignored, and ${fqdn}, ${temperature_*}, ${moisture_*}, ${update_interval} etc. never get substituted. Your log is full of warning lines like:

WARNING esphome.components.substitutions: The string '${fqdn}' looks like an expression,
  but could not resolve all the variables: 'fqdn' is undefined

~700 of them per full validation. In 1.6.1 that was mostly cosmetic. In 1.6.2 the fleet server now runs the same validation path ESPHome's compiler uses, so wifi.use_address = ${fqdn} on your devices is genuinely not resolving to a hostname — which matters for OTA.

The simplest fix in each common/*.yaml: move the contents of defaults: into a regular substitutions: block. A normal substitutions: block already acts as a fallback — any vars: the caller passes overrides; anything the caller doesn't pass takes the file-local default. Same semantics as what you were going for, but it actually runs.

And one small typo to fix in living-room-se-soil-sensor.yaml:

  dry: !include
    file: common/soil_dry_indicator.yaml
    var:     # ← should be "vars:"
      sensor_id: soil_dry
      led_id: neopixel_led

var: (singular) is not a recognized key, so none of those values reach the included file — that's where the ${sensor_id} / ${led_id} warnings come from.

If the hang comes back on 1.6.2

It shouldn't, but if it does, please capture a fresh thread dump the same way you did the first one:

curl -sSL https://raw.githubusercontent.com/weirded/distributed-esphome/develop/scripts/threaddump-addon.sh | bash

…while the add-on is pinned at 100% CPU, and paste the output into a new gist. Your original dump pointed straight at the problem — much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment