Given I've configured Envoy with LDS serving a TCP proxy listener on some port When the LDS is updated to remove that listener Then I expect all subsequent TCP connections to that port should be refused
write a bootstrap.yaml
like
---
admin:
access_log_path: /tmp/admin_access.log
address:
socket_address:
address: 0.0.0.0
port_value: 9901
node:
id: some-envoy-node
cluster: some-envoy-cluster
dynamic_resources:
lds_config:
path: /cfg/lds-current.yaml
static_resources:
clusters:
- name: example_cluster
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 93.184.216.3 # IP address of example.com
port_value: 80
write a lds-current.yaml
file like
version_info: "0"
resources:
- "@type": type.googleapis.com/envoy.api.v2.Listener
name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.tcp_proxy
config:
stat_prefix: ingress_tcp
cluster: example_cluster
launch envoy (I'm using v1.6.0)
envoy -c /cfg/bootstrap.yaml --v2-config-only --drain-time-s 30
confirm that the TCP proxy is working
curl -v -H 'Host: example.com' 127.0.0.1:8080
next update the LDS to return an empty set of listeners. this is a two step process. first, write an empty LDS response file lds-empty.yaml
version_info: "1"
resources: []
second, move that file on top of the file being watched:
mv lds-empty.yaml lds-current.yaml
in the Envoy stdout logs you'll see a line
source/server/lds_api.cc:68] lds: remove listener 'listener_0'
attempt to connect to the port where the listener used to be:
curl -v -H 'Host: example.com' 127.0.0.1:8080
Would like to see all new TCP connections be refused immediately, as if a listener had never been added in the first place. Existing TCP connections should continue to be serviced.
the port is still open, even after the LDS update occurs
lsof -i
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
envoy 1 root 9u IPv4 30166 0t0 TCP *:9901 (LISTEN)
envoy 1 root 22u IPv4 30171 0t0 TCP *:8080 (LISTEN)
clients can connect to the port, but the TCP proxying seems to hang (can't tell where)
curl -H 'Host: example.com' -v 127.0.0.1:8080
* Rebuilt URL to: 127.0.0.1:8080/
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 8080 (#0)
> GET / HTTP/1.1
> Host: example.com
> User-Agent: curl/7.47.0
> Accept: */*
>
^C
this state remains until --drain-time-s
time has elapsed (30 seconds in this example). At that point the port is finally closed, so you see
curl 127.0.0.1:8080
curl: (7) Failed to connect to 127.0.0.1 port 8080: Connection refused
In Cloud Foundry, we have the following setup currently:
Router =====> Envoy ----> App
(shared ingress) (TLS) TCP
Each application instance has a sidecar Envoy which terminates TLS connections from the shared ingress router. Applications may not speak HTTP, so we use basic TCP connectivity checks from the shared Router to the Envoy in order to infer application health and determine if a client connection should be load-balanced to that Envoy. When the upstream Envoy accepts the TCP connection, the Router considers that upstream healthy. When the upstream refuses the TCP connection, the Router considers that upstream unhealthy.
During a graceful shutdown, the scheduler ought to be able to drain the Envoy before terminating the application. This would mean that the Envoy ought to service any in-flight TCP connections without accepting any new ones.
Instead of updating LDS, we would be ok with using the admin endpoint /healthcheck/fail
instead. But we would expect the same behavior: refuse new TCP connections to the listener during the draining period, while continuing to service existing connections (for some duration).