Skip to content

Instantly share code, notes, and snippets.

@n-WN
Last active December 7, 2025 18:05
Show Gist options
  • Select an option

  • Save n-WN/a8ea44a5367c5f3cf9e61f672694500d to your computer and use it in GitHub Desktop.

Select an option

Save n-WN/a8ea44a5367c5f3cf9e61f672694500d to your computer and use it in GitHub Desktop.
Kimi CLI Connectivity Investigation: IPv6 Blackhole And Session IPv4 Fallback Strategy

Kimi CLI Connectivity Investigation: IPv6 Blackhole And Session IPv4 Fallback Strategy

Date: 2025-12-07

Host: macOS 15.5 (arm64)

Repo: MoonshotAI/kimi-cli (local working copy)

Current HEAD: b2af75a (refactor: enable Pyright strict type checking for kimi_cli/ui/shell (#427))

1. Executive Summary

  • Symptom (#414): “LLM provider error: Connection error” even when network appears fine.
  • Root cause on this host: A TUN-based proxy (sb) hijacks IPv6 default routes via utun6 with MTU 9000. IPv6 TLS handshakes black-hole (write succeeds, read stalls), while IPv4 works.
  • Evidence: IPv6 HTTPS to multiple sites (Cloudflare, Google, ip.gs) fails at TLS handshake; routing shows utun6 gateway fdfe:dcba:9866::1, MTU 9000 from sb config.
  • Remediation (network): Reduce TUN MTU (1280–1460), remove IPv6 TUN address or exclude ::/0 from auto route, or disable strict_route/auto_route.
  • Remediation (app, minimal and elegant): In Kimi CLI (via kosong ChatProvider) detect first connection error and switch this session to IPv4-preferred HTTP transport; keep for the session. No default behavior change, tiny patch, robust in broken-IPv6 environments.

2. Repository Context & Local Changes

  • We synced local main to upstream commit b2af75a exactly.
  • Earlier experimental changes (session IPv4 fallback) were stashed locally (not pushed) for reference.
  • Issue #414 quick facts:
    • Title: “LLM provider error: Connection error. But my network is working fine.”
    • Observation from reporter: IPv6 broken for a given domain; client tried IPv6 once and aborted without trying IPv4.

3. End-to-End Network Analysis

Commands and salient outputs captured on host.

3.1 DNS Resolution

$ dig +short A api.moonshot.ai
api.moonshot.ai.cdn.cloudflare.net.
104.18.28.136
104.18.29.136

$ dig +short AAAA api.moonshot.ai
api.moonshot.ai.cdn.cloudflare.net.
2606:4700::6812:1c88
2606:4700::6812:1d88

3.2 TCP Connectivity

$ nc -4 -vz -w 4 api.moonshot.ai 443
Connection to api.moonshot.ai port 443 [tcp/https] succeeded!

$ nc -6 -vz -w 4 api.moonshot.ai 443
Connection to api.moonshot.ai port 443 [tcp/https] succeeded!

IPv6 TCP can establish; issue is above TCP (TLS handshake path).

3.3 HTTPS Reachability (Curl)

$ curl -4 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' \
    https://api.moonshot.ai/v1/models
code=401 ip=104.18.28.136   # expected without auth, proves IPv4 good

$ curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' \
    https://api.moonshot.ai/v1/models
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.moonshot.ai:443
code=000 ip=2606:4700::6812:1c88             # TLS handshake fails on IPv6

Cross-site IPv6 failures (not an edge case specific to one host):

$ curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.cloudflare.com
curl: (35) SSL_ERROR_SYSCALL ... ip=2606:4700::6810:7b60

$ curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.google.com
curl: (35) SSL_ERROR_SYSCALL ... ip=2404:6800:4003:c02::67

3.4 OpenSSL TLS Handshake Probe (IPv6)

$ echo | openssl s_client -6 -connect api.moonshot.ai:443 -servername api.moonshot.ai -ign_eof -tls1_2
CONNECTED(00000005)
---
no peer certificate available
...
SSL handshake has read 0 bytes and written 218 bytes
...
Cipher    : 0000

Behavior indicates the client writes ClientHello but reads 0 bytes (black hole / PMTUD / middlebox issue).

3.5 Routing & Interfaces

$ route -n get -inet6 2606:4700::6812:1c88
route to: 2606:4700::6812:1c88
... gateway: fdfe:dcba:9866::1
interface: utun6
mtu: 9000
$ ifconfig en0 | egrep 'inet6|inet '
inet6 fe80::...%en0 prefixlen 64 secured scopeid 0xb
inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.3.255
$ scutil --nwi
IPv6 network interface information
  No IPv6 states found

3.6 TUN/VPN/Proxy Processes & Config

$ ps aux | rg -i 'sing[- ]?box|nehelper|tun'
root 1151 ... sb run -c /tmp/sub_dns.json
root 13094 ... /usr/libexec/nehelper
...

sb configuration excerpt:

File: /tmp/sub_dns.json

"inbounds": [
  {
    "type": "tun",
    "address": [
      "172.19.0.1/30",
      "fdfe:dcba:9866::1/126"   # IPv6 ULA, used as utun gateway
    ],
    "auto_route": true,
    "strict_route": true,
    "mtu": 9000                  # very large, prone to PMTUD issues
  },
  ...
],
"dns": { "strategy": "prefer_ipv4", ... }

4. Root Cause Hypothesis

  • The sb TUN forces IPv6 default routes through utun6 with a huge MTU (9000). On many paths, IPv6 PMTUD/ICMPv6 is filtered or broken, causing the TLS handshake to stall (client writes but never receives ServerHello).
  • IPv4 typically “works” because NAT and fragmentation hide the MTU issue differently. Hence: IPv4 succeeds, IPv6 fails widely.

5. Controlled Application-Layer Test

Using HTTPX to force address family:

# Force IPv6 → fails
AsyncHTTPTransport(local_address='::', retries=0) → ConnectError

# Force IPv4 → succeeds
AsyncHTTPTransport(local_address='0.0.0.0', retries=0) → HTTP 401 (as expected without auth)

Confirms that “forcing IPv4” is sufficient to bypass the broken IPv6 path.

6. Remediation Options

6.1 Network-Level (Preferred Long-Term)

  • Reduce TUN MTU: set "mtu": 1280 (or 1400/1460) in the sb TUN inbound.
  • Remove IPv6 TUN address (disable IPv6 hijack): delete "fdfe:dcba:9866::1/126" from inbounds[0].address.
  • Exclude IPv6 default route from auto routing, if supported by your sb version (e.g., use route exclude for ::/0).
  • Relax routing: disable strict_route and/or auto_route and configure explicit, minimal routes instead.
  • Validate after changes:
    • route -n get -inet6 2606:4700::6812:1c88 should NOT show utun6 default with MTU 9000
    • curl -6 https://api.moonshot.ai/v1/models should return a non-000 code (likely 401)
    • openssl s_client -6 ... should present a peer certificate

6.2 Application-Level (Minimal, Elegant, Session-Scoped)

  • Implement a kosong ChatProvider wrapper: on first APIConnectionError/APITimeoutError, swap the provider’s OpenAI HTTP client to an IPv4-preferred HTTPX transport; keep this for the session.
  • Characteristics:
    • No default behavior change; only triggers on actual connection failure
    • Session-scoped; no config or env required
    • Tiny patch, easy tests, conforms to Kimi/kosong abstractions

7. Proposed Implementation In Kimi CLI (Kosong-Friendly)

  • Wrapper class IPv4FallbackChatProvider(ChatProvider):
    • Delegates to the real Kimi provider for all operations
    • On first connection error in generate(), log info and call _enable_ipv4():
      • Build httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(local_address='0.0.0.0', retries=2, http2=False))
      • Replace provider’s AsyncOpenAI via client.copy(http_client=...)
      • Retry once; if still fails, bubble up error
  • Integration: In create_llm() for Kimi provider, wrap the provider instance in IPv4FallbackChatProvider.
  • Unit tests:
    • Simulate first-call connection error; verify wrapper applies _enable_ipv4() and subsequent call succeeds or errors differently
    • Assert transport _pool._local_address == '0.0.0.0'

8. Manual Validation Checklist

  • IPv6 baseline:
    • curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://api.moonshot.ai/v1/models → should fail pre-fix
  • Kimi CLI with wrapper:
    • First request: expect one connection error, then log: “Using IPv4-preferred transport for this session.”
    • Subsequent: requests should proceed without connection errors
  • After network fix (MTU reduced / IPv6 not hijacked):
    • The IPv4 fallback should rarely trigger; IPv6 curl should return codes like 401 on the same endpoint

9. Appendix: Command Transcript (Abbreviated)

# System
uname -a
sw_vers

# DNS
scutil --dns

dig +short A api.moonshot.ai

dig +short AAAA api.moonshot.ai

# Connectivity
nc -4 -vz -w 4 api.moonshot.ai 443
nc -6 -vz -w 4 api.moonshot.ai 443

# HTTPS
curl -4 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://api.moonshot.ai/v1/models
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://api.moonshot.ai/v1/models
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.cloudflare.com
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.google.com

# TLS probe
echo | openssl s_client -6 -connect api.moonshot.ai:443 -servername api.moonshot.ai -ign_eof -tls1_2

# Routing
route -n get -inet6 2606:4700::6812:1c88
netstat -rn -f inet6
ifconfig en0
scutil --nwi

# Processes
ps aux | rg -i 's[- ]?b|nehelper|tun'

# sb config
sed -n '1,200p' /tmp/sub_dns.json

10. References

  • RFC 8305 (Happy Eyeballs) background
  • httpx/httpcore address family control via local_address
  • Kosong ChatProvider API and Kimi provider composition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment