Date: 2025-12-07
Host: macOS 15.5 (arm64)
Repo: MoonshotAI/kimi-cli (local working copy)
Current HEAD: b2af75a (refactor: enable Pyright strict type checking for kimi_cli/ui/shell (#427))
- Symptom (#414): “LLM provider error: Connection error” even when network appears fine.
- Root cause on this host: A TUN-based proxy (sb) hijacks IPv6 default routes via
utun6with MTU 9000. IPv6 TLS handshakes black-hole (write succeeds, read stalls), while IPv4 works. - Evidence: IPv6 HTTPS to multiple sites (Cloudflare, Google, ip.gs) fails at TLS handshake; routing shows
utun6gatewayfdfe:dcba:9866::1, MTU 9000 from sb config. - Remediation (network): Reduce TUN MTU (1280–1460), remove IPv6 TUN address or exclude
::/0from auto route, or disable strict_route/auto_route. - Remediation (app, minimal and elegant): In Kimi CLI (via kosong ChatProvider) detect first connection error and switch this session to IPv4-preferred HTTP transport; keep for the session. No default behavior change, tiny patch, robust in broken-IPv6 environments.
- We synced local
mainto upstream commitb2af75aexactly. - Earlier experimental changes (session IPv4 fallback) were stashed locally (not pushed) for reference.
- Issue #414 quick facts:
- Title: “LLM provider error: Connection error. But my network is working fine.”
- Observation from reporter: IPv6 broken for a given domain; client tried IPv6 once and aborted without trying IPv4.
Commands and salient outputs captured on host.
$ dig +short A api.moonshot.ai
api.moonshot.ai.cdn.cloudflare.net.
104.18.28.136
104.18.29.136
$ dig +short AAAA api.moonshot.ai
api.moonshot.ai.cdn.cloudflare.net.
2606:4700::6812:1c88
2606:4700::6812:1d88
$ nc -4 -vz -w 4 api.moonshot.ai 443
Connection to api.moonshot.ai port 443 [tcp/https] succeeded!
$ nc -6 -vz -w 4 api.moonshot.ai 443
Connection to api.moonshot.ai port 443 [tcp/https] succeeded!
IPv6 TCP can establish; issue is above TCP (TLS handshake path).
$ curl -4 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' \
https://api.moonshot.ai/v1/models
code=401 ip=104.18.28.136 # expected without auth, proves IPv4 good
$ curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' \
https://api.moonshot.ai/v1/models
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to api.moonshot.ai:443
code=000 ip=2606:4700::6812:1c88 # TLS handshake fails on IPv6
Cross-site IPv6 failures (not an edge case specific to one host):
$ curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.cloudflare.com
curl: (35) SSL_ERROR_SYSCALL ... ip=2606:4700::6810:7b60
$ curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.google.com
curl: (35) SSL_ERROR_SYSCALL ... ip=2404:6800:4003:c02::67
$ echo | openssl s_client -6 -connect api.moonshot.ai:443 -servername api.moonshot.ai -ign_eof -tls1_2
CONNECTED(00000005)
---
no peer certificate available
...
SSL handshake has read 0 bytes and written 218 bytes
...
Cipher : 0000
Behavior indicates the client writes ClientHello but reads 0 bytes (black hole / PMTUD / middlebox issue).
$ route -n get -inet6 2606:4700::6812:1c88
route to: 2606:4700::6812:1c88
... gateway: fdfe:dcba:9866::1
interface: utun6
mtu: 9000
$ ifconfig en0 | egrep 'inet6|inet '
inet6 fe80::...%en0 prefixlen 64 secured scopeid 0xb
inet 192.168.0.2 netmask 0xffffff00 broadcast 192.168.3.255
$ scutil --nwi
IPv6 network interface information
No IPv6 states found
$ ps aux | rg -i 'sing[- ]?box|nehelper|tun'
root 1151 ... sb run -c /tmp/sub_dns.json
root 13094 ... /usr/libexec/nehelper
...
sb configuration excerpt:
File: /tmp/sub_dns.json
"inbounds": [
{
"type": "tun",
"address": [
"172.19.0.1/30",
"fdfe:dcba:9866::1/126" # IPv6 ULA, used as utun gateway
],
"auto_route": true,
"strict_route": true,
"mtu": 9000 # very large, prone to PMTUD issues
},
...
],
"dns": { "strategy": "prefer_ipv4", ... }
- The sb TUN forces IPv6 default routes through
utun6with a huge MTU (9000). On many paths, IPv6 PMTUD/ICMPv6 is filtered or broken, causing the TLS handshake to stall (client writes but never receives ServerHello). - IPv4 typically “works” because NAT and fragmentation hide the MTU issue differently. Hence: IPv4 succeeds, IPv6 fails widely.
Using HTTPX to force address family:
# Force IPv6 → fails
AsyncHTTPTransport(local_address='::', retries=0) → ConnectError
# Force IPv4 → succeeds
AsyncHTTPTransport(local_address='0.0.0.0', retries=0) → HTTP 401 (as expected without auth)
Confirms that “forcing IPv4” is sufficient to bypass the broken IPv6 path.
- Reduce TUN MTU: set
"mtu": 1280(or 1400/1460) in the sb TUN inbound. - Remove IPv6 TUN address (disable IPv6 hijack): delete
"fdfe:dcba:9866::1/126"frominbounds[0].address. - Exclude IPv6 default route from auto routing, if supported by your sb version (e.g., use route exclude for
::/0). - Relax routing: disable
strict_routeand/orauto_routeand configure explicit, minimal routes instead. - Validate after changes:
route -n get -inet6 2606:4700::6812:1c88should NOT showutun6default with MTU 9000curl -6 https://api.moonshot.ai/v1/modelsshould return a non-000 code (likely 401)openssl s_client -6 ...should present a peer certificate
- Implement a kosong ChatProvider wrapper: on first
APIConnectionError/APITimeoutError, swap the provider’s OpenAI HTTP client to an IPv4-preferred HTTPX transport; keep this for the session. - Characteristics:
- No default behavior change; only triggers on actual connection failure
- Session-scoped; no config or env required
- Tiny patch, easy tests, conforms to Kimi/kosong abstractions
- Wrapper class
IPv4FallbackChatProvider(ChatProvider):- Delegates to the real Kimi provider for all operations
- On first connection error in
generate(), log info and call_enable_ipv4():- Build
httpx.AsyncClient(transport=httpx.AsyncHTTPTransport(local_address='0.0.0.0', retries=2, http2=False)) - Replace provider’s
AsyncOpenAIviaclient.copy(http_client=...) - Retry once; if still fails, bubble up error
- Build
- Integration: In
create_llm()for Kimi provider, wrap the provider instance inIPv4FallbackChatProvider. - Unit tests:
- Simulate first-call connection error; verify wrapper applies
_enable_ipv4()and subsequent call succeeds or errors differently - Assert transport
_pool._local_address == '0.0.0.0'
- Simulate first-call connection error; verify wrapper applies
- IPv6 baseline:
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://api.moonshot.ai/v1/models→ should fail pre-fix
- Kimi CLI with wrapper:
- First request: expect one connection error, then log: “Using IPv4-preferred transport for this session.”
- Subsequent: requests should proceed without connection errors
- After network fix (MTU reduced / IPv6 not hijacked):
- The IPv4 fallback should rarely trigger; IPv6 curl should return codes like 401 on the same endpoint
# System
uname -a
sw_vers
# DNS
scutil --dns
dig +short A api.moonshot.ai
dig +short AAAA api.moonshot.ai
# Connectivity
nc -4 -vz -w 4 api.moonshot.ai 443
nc -6 -vz -w 4 api.moonshot.ai 443
# HTTPS
curl -4 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://api.moonshot.ai/v1/models
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://api.moonshot.ai/v1/models
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.cloudflare.com
curl -6 -sS -o /dev/null -w 'code=%{http_code} ip=%{remote_ip}\n' https://www.google.com
# TLS probe
echo | openssl s_client -6 -connect api.moonshot.ai:443 -servername api.moonshot.ai -ign_eof -tls1_2
# Routing
route -n get -inet6 2606:4700::6812:1c88
netstat -rn -f inet6
ifconfig en0
scutil --nwi
# Processes
ps aux | rg -i 's[- ]?b|nehelper|tun'
# sb config
sed -n '1,200p' /tmp/sub_dns.json
- RFC 8305 (Happy Eyeballs) background
- httpx/httpcore address family control via
local_address - Kosong ChatProvider API and Kimi provider composition