NLB + Envoy 負荷分散検証

TL;DR

Envoy はデフォルトでは接続済みコネクションを無制限に保持するため、Envoy Deployment の更新などでコネクション数が一度偏ると長時間それが維持される
Envoy max-connection-duration を用いて定期的にコネクションをクローズすることでコネクション数が分散する
- gRPC クライアントはコネクションがクローズされると自動的に再接続する（Go クライアントの場合）
アクティブなコネクションのクローズを避けたい場合は idle-timeout も使用できる

検証環境の構築

ここでは scalar-terraform-examples を用いる。

$ g show -q
commit 4e7aad442b3e7ae8da8984df5567edbc54aeff71 (HEAD -> main, origin/main, origin/HEAD)
Author: tei-k <[email protected]>
Date:   Tue Jun 8 16:44:25 2021 +0900

    Fix to use prefix version for azure k8s module (#28)

ネットワーク

$ g diff
diff --git a/aws/network/example.tfvars b/aws/network/example.tfvars
index a5519c2..b17090c 100644
--- a/aws/network/example.tfvars
+++ b/aws/network/example.tfvars
@@ -1,13 +1,12 @@
-region = "ap-northeast-1"
+region = "us-west-1"

 base = "default" # bai, chiku, sho

-name = "example-aws" # maximum of 13 characters
+name = "ksuda-nlb-envoy" # maximum of 13 characters

 locations = [
-  "ap-northeast-1a",
-  "ap-northeast-1c",
-  "ap-northeast-1d",
+  "us-west-1b",
+  "us-west-1c"
 ]

 public_key_path = "./example_key.pub"

Kubernetes クラスタ

$ g diff example.tfvars
diff --git a/aws/kubernetes/example.tfvars b/aws/kubernetes/example.tfvars
index 4ce93c0..a3b4497 100644
--- a/aws/kubernetes/example.tfvars
+++ b/aws/kubernetes/example.tfvars
@@ -1,7 +1,7 @@
-region = "ap-northeast-1"
+region = "us-west-1"

 kubernetes_cluster = {
-  # name                                 = "scalar-kubernetes"
+  name                                 = "ksuda-nlb-envoy"
   # kubernetes_version                   = "1.19"
   # cluster_enabled_log_types            = ""
   # cluster_log_retention_in_days        = "90"

kube-prometheus のインストール

Envoy のメトリクスを収集、可視化するために kube-prometheus を用いて Prometheus をインストールする。

[centos@bastion-1 ~]$ git clone https://github.com/prometheus-operator/kube-prometheus.git
[centos@bastion-1 ~]$ cd kube-prometheus/
[centos@bastion-1 kube-prometheus]$ kubectl create -f manifests/setup
[centos@bastion-1 kube-prometheus]$ kubectl create -f manifests/

Envoy + gRPC アプリケーションのデプロイ

ここでは簡単な gRPC アプリケーションをデプロイする。これはシンプルな Hello world アプリケーションだが、クライアントはコネクションを維持しつつ、一定間隔でリクエストするようになっている。Envoy Deployment は3つの Pod レプリカを持つ。

また、この時点での Envoy は、接続済のコネクションは無制限で保持する設定となっている。

[centos@bastion-1 ~]$ git clone -b aws-nlb https://github.com/superbrothers-sandbox/try-envoy-grpc.git && cd try-envoy-grpc
[centos@bastion-1 try-envoy-grpc]$ kubectl apply -k deploy/base
[centos@bastion-1 try-envoy-grpc]$ kubectl get all
NAME                          READY   STATUS    RESTARTS   AGE
pod/envoy-7575d7d9fb-j655c    1/1     Running   0          21s
pod/envoy-7575d7d9fb-rxhvp    1/1     Running   0          21s
pod/envoy-7575d7d9fb-xgvpl    1/1     Running   0          21s
pod/server-7bbdd8b697-6qj49   1/1     Running   0          21s
pod/server-7bbdd8b697-9zxh5   1/1     Running   0          21s
pod/server-7bbdd8b697-bb5mf   1/1     Running   0          21s

NAME                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                                     PORT(S)                         AGE
service/envoy        LoadBalancer   172.20.114.181   a3d9812d7b7eb41afb413248e3d9c40e-6cb71b7d839cb127.elb.us-west-1.amazonaws.com   8080:31401/TCP,9901:31241/TCP   21s
service/kubernetes   ClusterIP      172.20.0.1       <none>                                                                          443/TCP                         38m
service/server       ClusterIP      None             <none>                                                                          8888/TCP                        21s

NAME                     READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/envoy    3/3     3            3           21s
deployment.apps/server   3/3     3            3           21s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/envoy-7575d7d9fb    3         3         3       21s
replicaset.apps/server-7bbdd8b697   3         3         3       21s

envoy Service が LoadBalancer type となっており、a3d9812d7b7eb41afb413248e3d9c40e-6cb71b7d839cb127.elb.us-west-1.amazonaws.com が LB のアドレスとして払い出されている。

gRPC クライアントの実行

gRPC クライアントをコンテナで実行するため Docker をインストールする。

https://docs.docker.com/engine/install/centos/

[centos@bastion-1 ~]$ sudo docker version
Client: Docker Engine - Community
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        f0df350
 Built:             Wed Jun  2 11:58:10 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:56:35 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

20個のクライアントをコンテナとして実行する。各クライアントはコネクションを維持しながら3秒間隔でサーバにリクエストする。つまり20コネクションがクライアント -> Envoy 間で確立される。

[centos@bastion-1 ~]$ for i in `seq 20`; do sudo docker run -d ghcr.io/superbrothers-sandbox/try-envoy-grpc/hello -client -addr a3d9812d7b7eb41afb413248e3d9c40e-6cb71b7d839cb127.elb.us-west-1.amazonaws.com:8080 -client.interval 3s; done

その後、一度 Envoy Pods を更新する。これにより Envoy Pod が更新されるため、よりコネクションがある特定の Pod に偏るはず。

[centos@bastion-1 ~]$ kubectl rollout restart deploy envoy

次に、Envoy で max-connection-duration を 1s として設定を更新する。この設定により、クライアントから Envoy のコネクションは最長で1秒間しか保持されなくなり、切断されたコネクションは自動的に再接続される。これによりある期間で平均するとコネクション数が偏らなくなる。

[centos@bastion-1 try-envoy-grpc]$ diff -u deploy/base/config/envoy.yaml deploy/envoy-max-connection-duration/config/envoy.yaml
--- deploy/base/config/envoy.yaml       2021-06-12 02:49:45.482457956 +0000
+++ deploy/envoy-max-connection-duration/config/envoy.yaml      2021-06-12 02:49:45.483457955 +0000
@@ -11,6 +11,9 @@
           "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
           codec_type: AUTO
           stat_prefix: ingress_http
+          common_http_protocol_options:
+            # https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/core/v3/protocol.proto#envoy-v3-api-msg-config-core-v3-httpprotocoloptions
+            max_connection_duration: 1s
           route_config:
             name: local_route
             virtual_hosts:
[centos@bastion-1 try-envoy-grpc]$ kubectl apply -k deploy/envoy-max-connection-duration/
configmap/envoy-config-c4t76h25d4 created
service/envoy unchanged
service/server unchanged
deployment.apps/envoy configured
deployment.apps/server unchanged
servicemonitor.monitoring.coreos.com/envoy unchanged

この一連の変更を含む期間のメトリクスを可視化したものが下記で、前半は20のコネクションが1つの Pod に偏っていることがわかる。その後設定変更により、コネクションが3つ全ての Envoy Pods に分散されていることがわかる。なお、max-connection-duration の設定変更前にも一定簡単でコネクションの切断が起きているが、これは Envoy Pod のヘルスチェックで、きれいなグラフを作るためにはヘルスチェックを切っておくべきだったが、環境をクリーンアップしてから気づいたため、残ってしまった。

https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_conn_man/stats.html

まとめ

ここでは Envoy max-connection-duration を用いて定期的にコネクションを切断することである期間で平均すると分散することが確認できた。コネクションを切断しない場合は、コネクションは無制限に維持されるため、Envoy Pods 間でコネクション数が偏る可能性がある。アクティブなコネクションを切断することに抵抗がある場合は idle_timeout を使うこともできる。

Service type LoadBalancer はどのように機能するのか

Service type LoadBalancer は upstream として Service NodePort を使用する。そのため、直接 Pod に対して分散されるわけではなく、upstream はノードの IP になるため、ロードバランサの負荷分散アルゴリズムは直接関連がない（しかしながら、モノによっては直接 Pod IP を upstream とする場合もある）。AWS NLB の場合は、Node Port が対象となる。NodePort に届いたパケットは ClusterIP による負荷分散と同じ仕組みで Service のラベルセレクタに一致する Pod IP にラウンドロビンで分散する。

KubernetesのDiscovery＆LBリソース（その1） | Think IT（シンクイット）

AWS ALB の場合は upstream に NodePort 以外に直接 Pod IP に転送することができる。

superbrothers/gist:5fd0dd75878ea4d9b66a1f0fa28d4655