Skip to content

Instantly share code, notes, and snippets.

@bojand
Last active July 15, 2024 02:51
Show Gist options
  • Save bojand/6a604f7e369d7c7d8c39eb77878a42c2 to your computer and use it in GitHub Desktop.
Save bojand/6a604f7e369d7c7d8c39eb77878a42c2 to your computer and use it in GitHub Desktop.
gRPC and Load Balancing

Just documenting docs, articles, and discussion related to gRPC and load balancing.

https://github.com/grpc/grpc/blob/master/doc/load-balancing.md

Seems gRPC prefers thin client-side load balancing where a client gets a list of connected clients and a load balancing policy from a "load balancer" and then performs client-side load balancing based on the information. However, this could be useful for traditional load banaling approaches in clound deployments.

https://groups.google.com/forum/#!topic/grpc-io/8s7UHY_Q1po

gRPC "works" in AWS. That is, you can run gRPC services on EC2 nodes and have them connect to other nodes, and everything is fine. If you are using AWS for easy access to hardware then all is fine. What doesn't work is ELB (aka CLB), and ALBs. Neither of these support HTTP/2 (h2c) in a way that gRPC needs. ELBs work in TCP mode, but you give up useful health checking and the join-shortest-queue behaviour that makes normal HTTP mode ELBs good. It also means you may experience problems with how well balanced your cluster is since only individual client connections are balanced rather than individual requests to the backend. If a single client is generating a lot of requests, they will all go to the same backend rather than being balanced across your available instances. This also means that ECS doesn't really work properly since it only supports the use of ELB and ALB load balancers. If your requirements are not too demanding TCP mode ELBs do work, and you can definitely ship stuff that way. It's just not ideal and has some fairly major problems as your request rates and general system complexity increase

I use gRPC on AWS and it works great. However, I don't believe ALBs support trailers in the HTTP/2 spec, so that won't work. Something may have changed since the last time I looked, but don't count on an HTTP/2 ALB working. I believe it's HTTP/2 to clients of the ELB but HTTP/1.1 to your backend servers.

Alternatively use ELB/ALB at Layer-3 but put your own HTTP2 compliant proxy behind it (Envoy, nghttpx, Linkerd, Traefik, ...) I know Lyft does this in production with Envoy.

https://forums.aws.amazon.com/thread.jspa?messageID=749377

We're trying to get the Application Load Balancer cooperating with some ECS-hosted gRPC services. So far it's failing; poking at the server a bit, it looks like requests are coming from the load balancer as HTTP/1.1, while gRPC server is expecting HTTP/2. The info on the load balancer suggests it supports HTTP/2, but does that only apply to the client side?

Hi. Yes, the requests are sent from the load balancer to the targets as HTTP/1.1. For more information, see http://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-listeners.html#listener-configuration.

https://groups.google.com/forum/#!topic/grpc-io/rgJ7QyecPoY

We sort of have this situation, since we use Google App Engine, and its load balancer and URLFetch service only support HTTP/1.1. We used the PRPC implementation described here, which is a mapping of the simple unary gRPC requests to an HTTP/1.1 protocol: http://nodir.io/post/138899670556/prpc. We used the Go implementation from the Chrome tools repository, and wrote our own client and server, which were relatively simple but absolutely do not support all of gRPC's features. The "better" approach might be to look at the grpc-web work, and possibly just run the grpcwebproxy. See: https://github.com/improbable-eng/grpc-web I think that will also have the problem that if your clients aren't Go or Javascript, you will need to implement the protocol yourself.

We normally recommend using a proxy that supports HTTP/2 to the backend, like nghttpx and derivatives (Envoy, Istio). If that's not possible, then the solutions tend to involve something that looks like grpc-web. If the proxy you are already using supports HTTP/1.1 trailers, it should be possible to use nghttpx to up-convert back to HTTP/2, but I've not tried that out.

Microservices at Lyst

HTTP load-balancing on gRPC services

Using Envoy to Load Balance gRPC Traffic

nginx now supports gRPC

gRPC Load Balancing with Nginx

DNS Load Balancing in GRPC

gRPC Load Balancing using Kubernetes and Linkerd

Tyk.io supports gRPC

HAProxy now supports gRPC

gRPC + AWS: Some gotchas

On gRPC Load Balancing

gRPC Load Balancing on Kubernetes

gRPC Load Balancing inside Kubernetes

How To Create Load Balancer For GRPC On AWS

Learnings from gRPC on AWS

New – Application Load Balancer Support for End-to-End HTTP/2 and gRPC

Demo for enabling gRPC workloads with end to end HTTP/2 support

Why load balancing gRPC is tricky? - A blog post providing an overview of gRPC load balancing options.

gRPC Client-Side Load Balancing in Go

Load Balancing gRPC services

gRPC load balancing with grpc-go

@SHUFIL
Copy link

SHUFIL commented Aug 3, 2021

gRPC is working using with AWS network loadbalancer , but when we use SSL (ACM) in NLB still it is not working ,any one know why it is ?

@comerford
Copy link

I'll add that I successfully had an EKS service deploy a non-secure GRPC API (TLS is actually not recommended for production in this case) behind an NLB fronting the TLS termination. I was able to do so by just using the following notations:

service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "http"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-ssl-negotiation-policy: "ELBSecurityPolicy-FS-1-2-Res-2020-10"
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "grpcapi,httpapi"
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:eu-west-1:1231212341234:certificate/your-arn"

Another fun note; I was configuring it with terraform and the switch to nlb was instantaneous from a terraform perspective, but it takes a while for it to actually provision and become available. Hence any references made to the LB (like to make a CNAME to the LB address) did not pick up any difference immediately. I had to re-run terraform once the LB was up for it to realise the target LB had changed (and update the Route53 CNAME).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment