Skip to content

Instantly share code, notes, and snippets.

@davidar
Last active April 11, 2026 00:54
Show Gist options
  • Select an option

  • Save davidar/172af3d50041a298dcd6aff03ceb3c50 to your computer and use it in GitHub Desktop.

Select an option

Save davidar/172af3d50041a298dcd6aff03ceb3c50 to your computer and use it in GitHub Desktop.
Docker healthchecks can't detect host-side network failures — why and what to do instead

Docker Healthchecks Can't Detect Host-Side Network Failures

If you're using Docker healthchecks with an autoheal sidecar (like willfarrell/autoheal) to auto-restart unhealthy containers, there's a class of failures they'll never catch.

The Setup

A common pattern for self-healing containers:

services:
  myapp:
    image: myapp:latest
    ports:
      - "127.0.0.1:3001:3000"
    healthcheck:
      test: ["CMD", "wget", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

  autoheal:
    image: willfarrell/autoheal
    environment:
      - AUTOHEAL_CONTAINER_LABEL=all
      - AUTOHEAL_INTERVAL=60
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

This works great for application-level hangs: if the app stops responding to HTTP requests, the healthcheck fails, Docker marks the container unhealthy, and autoheal restarts it.

The Blind Spot

Docker healthchecks run inside the container. The wget localhost:3000/health happens within the container's network namespace, where localhost always resolves to the container itself.

This means the healthcheck cannot detect:

  • Host-side routing failures (e.g. Tailscale exit node hijacking Docker bridge traffic)
  • Port mapping issues (Docker's iptables rules getting flushed or corrupted)
  • Bridge network failures (the docker0 bridge going down)
  • Firewall changes blocking host-to-container traffic

In all of these cases:

  • The container is running
  • The app inside is healthy
  • The healthcheck passes
  • But nobody can reach the service from outside the container

Real-World Example

On a server running Tailscale with a Mullvad exit node, Tailscale periodically refreshes its ip routing rules. When this happens, traffic to Docker bridge networks (172.16.0.0/12) gets routed through the exit node instead of staying local. The container keeps passing healthchecks while being completely unreachable from the host.

The fix for that specific issue is ip rule bypasses, but the general lesson applies.

Options for External Health Monitoring

If you need to detect these failures, the healthcheck needs to happen from outside the container:

1. Healthcheck from the host (cron/timer)

#!/bin/bash
# /usr/local/bin/check-services.sh
if ! curl -sf --max-time 5 http://127.0.0.1:3001/health > /dev/null; then
    docker restart myapp
    logger "Restarted myapp - health endpoint unreachable from host"
fi

2. Healthcheck through your reverse proxy / tunnel

Test the full path that users take. If you're using Cloudflare Tunnel:

curl -sf --max-time 10 https://myapp.example.com/health

3. External uptime monitoring

Services like Uptime Kuma (self-hosted), UptimeRobot, or Healthchecks.io that probe your public endpoints.

Which Approach to Use

Use both in-container healthchecks and external monitoring:

  • In-container healthcheck + autoheal: Catches application hangs, OOM, deadlocks. Fast recovery (< 2 minutes).
  • External monitoring: Catches network/routing/infrastructure failures. Alerts you to problems the container can't see.

They cover complementary failure modes. The in-container check is not redundant — it catches the most common failure (app hang) faster than external polling would.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment