Skip to content

Instantly share code, notes, and snippets.

@jonathanagustin
Last active October 3, 2025 09:04
Show Gist options
  • Select an option

  • Save jonathanagustin/d6ab7ce4e442e2e3c62690fdb4a9c00d to your computer and use it in GitHub Desktop.

Select an option

Save jonathanagustin/d6ab7ce4e442e2e3c62690fdb4a9c00d to your computer and use it in GitHub Desktop.
consul research notes

Research Notes

Connection Refused Error

When deploying microservices in Kubernetes, one might encounter this error:

upstream connect error or disconnect/reset before headers. reset reason: connection failure, 
transport failure reason: delayed connect error: 111

Error 111 is ECONNREFUSED - the "connection refused" error. In a service mesh environment, this error reveals truths about how the networking works.


Network Stack

Kubernetes Network Model

Kubernetes implements a flat network where every pod can reach every other pod directly.

graph TB
    subgraph "Application Layer"
        App["Application calls:<br/>http://products-api:9090"]
    end
    
    subgraph "DNS Layer"
        DNS["CoreDNS resolves:<br/>products-api to 10.96.1.5"]
    end
    
    subgraph "Service Layer"
        Service["Service (ClusterIP)<br/>10.96.1.5:9090"]
    end
    
    subgraph "Endpoint Layer"
        Endpoints["Endpoints:<br/>10.244.1.5:9090<br/>10.244.1.6:9090"]
    end
    
    subgraph "Pod Layer"
        Pod1["Pod 1<br/>10.244.1.5"]
        Pod2["Pod 2<br/>10.244.1.6"]
    end
    
    App --> DNS
    DNS --> Service
    Service --> Endpoints
    Endpoints --> Pod1
    Endpoints --> Pod2
Loading

When your application calls http://products-api:9090:

The name products-api gets resolved by CoreDNS to the Service's ClusterIP address. In Kubernetes, service names follow this pattern: <service-name>.<namespace>.svc.cluster.local. Within the same namespace, you can use just the service name.

The ClusterIP is a virtual IP that doesn't actually exist on any network interface. Instead, kube-proxy (or iptables rules, or IPVS) intercepts packets destined for this IP and rewrites them to point to actual pod IPs.

The Service maintains a list of endpoints - the IP:port combinations of healthy pods matching its selector. Traffic is distributed across these endpoints.

This abstraction is where connection errors can hide. A "connection refused" might mean the pod doesn't exist, the port is wrong, the application isn't listening, or that the service mesh is misconfigured.

Notes on Port Binding and Interfaces

graph LR
    subgraph "Pod Network Namespace"
        subgraph "Interfaces"
            Lo["Loopback<br/>127.0.0.1<br/>localhost"]
            Eth0["eth0<br/>10.244.1.5<br/>Pod IP"]
        end
        
        subgraph "Application"
            Bind["App binds to:<br/>0.0.0.0:9090"]
        end
    end
    
    External["External Traffic<br/>(from other pods)"] --> Eth0
    Eth0 --> Bind
    Lo --> Bind
Loading

When an application binds to 0.0.0.0:9090, it's saying "accept connections on port 9090 on any interface."

This means connections can come in through:

  • The pod's IP address (eth0 interface)
  • Localhost (127.0.0.1)
  • Any other interface in the pod's network namespace

If the application binds to 127.0.0.1:9090 instead, it only accepts connections from WITHIN the same pod. This is a mistake that causes "connection refused" errors - the application is running and listening, but NOT on an interface that external callers can reach.

In the HashiCups products-api, the configuration specifies:

{
  "bind_address": ":9090"
}

The leading colon (:9090) is Go's shorthand for 0.0.0.0:9090, meaning "listen on all interfaces on port 9090." This is correct for a service that needs to accept connections from other pods.


Service Mesh

Notes

Without a service mesh, communication between services in Kubernetes is plain HTTP over the network. This is troublesome because:

  1. Traffic between pods is unencrypted. Anyone with access to the network can intercept and read it. There's no strong identity - you can't be certain which service is really calling you.

  2. You can see that your service is getting requests, but you don't know where they're coming from, how long the network transit took, or whether they're part of a larger transaction.

  3. Each service must implement its own retry logic, circuit breakers, and timeout handling. This leads to inconsistent behavior and duplicated code.

  4. If you want to control which services can talk to which, you're stuck with IP-based network policies that break when pods move or scale.

A service mesh addresses all of these by injecting a proxy sidecar into each pod. This proxy handles all network traffic in and out of the application.

Sidecar Pattern

graph TB
    subgraph "Without Service Mesh"
        App1["Application"] -->|"Direct network call"| App2["Application"]
    end
    
    subgraph "With Service Mesh"
        subgraph "Pod A"
            AppA["Application"]
            EnvoyA["Envoy Proxy"]
        end
        
        subgraph "Pod B"
            AppB["Application"]
            EnvoyB["Envoy Proxy"]
        end
        
        AppA -->|"Plain HTTP"| EnvoyA
        EnvoyA -->|"Encrypted mTLS"| EnvoyB
        EnvoyB -->|"Plain HTTP"| AppB
    end
Loading

The sidecar proxy (Envoy in the Consul case) is a separate container that runs alongside an application container in the same pod. Because containers in a pod share a network namespace, the proxy can intercept all network traffic without the application knowing.

This interception contributes to the "connection refused" errors in a service mesh. The application thinks it's connecting directly to another service, but it's actually connecting to the LOCAL Envoy proxy, which then connects to the remote service's proxy.


Consul Connect and Transparent Proxy

Notes

Consul Connect's transparent proxy mode is what makes the service mesh "transparent" to applications. The code calls http://products-api:9090 using a Kubernetes service name, and the service mesh handles everything else.

sequenceDiagram
    participant App as Application Process
    participant Kernel as Linux Kernel
    participant IPTables as iptables Rules
    participant Envoy as Envoy Sidecar
    participant Remote as Remote Service
    
    Note over App,Remote: Pod Startup
    App->>App: consul-connect-inject-init runs
    App->>IPTables: Install REDIRECT rules
    Note over IPTables: All outbound TCP → port 15001
    
    Note over App,Remote: Runtime - Making a Request
    App->>Kernel: connect() to products-api:9090
    Kernel->>IPTables: Check OUTPUT chain
    IPTables->>IPTables: Match: REDIRECT to 15001
    IPTables->>Envoy: Connection redirected
    Note over Envoy: Envoy listening on 127.0.0.1:15001
    
    Envoy->>Envoy: Parse destination: products-api:9090
    Envoy->>Envoy: Query Consul for products-api
    Envoy->>Remote: Establish mTLS connection
    Note over Envoy,Remote: Certificate exchange, encryption
    
    Remote->>Envoy: Response (encrypted)
    Envoy->>App: Response (decrypted)
Loading

The process involves two things:

Initialization (Run Once at Pod Startup):

When a pod starts with Consul Connect enabled, an init container called consul-connect-inject-init runs before the application starts. The init container has elevated privileges (NET_ADMIN capability) to modify the pod's network configuration.

It installs iptables rules that look something like this:

iptables -t nat -A OUTPUT -p tcp -j REDIRECT --to-port 15001

Translation into English: "Any outbound TCP connection should be redirected to port 15001." Port 15001 is where the Envoy sidecar listens for outbound connections.

At Runtime (Every Request):

When application code executes http.Get("http://products-api:9090/health"):

  1. The Go HTTP client resolves products-api via DNS, getting back the Service's ClusterIP
  2. The client opens a TCP socket to that IP on port 9090
  3. The Linux kernel's network stack processes this connection
  4. Before the packet leaves the pod, iptables intercepts it
  5. The destination is rewritten: instead of going to products-api's IP, it goes to 127.0.0.1:15001
  6. The Envoy sidecar receives the connection on port 15001
  7. Envoy extracts the original destination (products-api:9090) from the socket metadata
  8. Envoy queries Consul to find healthy instances of products-api
  9. Envoy establishes an mTLS connection to the target service's Envoy sidecar
  10. The request is forwarded, encrypted, through the mesh

This is why it's called "transparent", cause the application code is unaware this is happening.

Transparent Proxy --> "Connection Refused"

Why one might see error 111 (connection refused) in a service mesh:

Envoy isn't running

If the Envoy sidecar container failed to start, iptables rules are still in place redirecting traffic to port 15001, but nothing is listening there. You get connection refused.

Check with the following (fill in the placeholders):

kubectl get pod public-api-xxx -o jsonpath='{.spec.containers[*].name}'
# Should show: public-api, consul-connect-envoy-sidecar

Envoy is running but can't reach Consul

Envoy needs to connect to Consul to get service catalog information. If it can't reach Consul, it doesn't know where to route traffic.

Check with:

kubectl logs public-api-xxx -c consul-connect-envoy-sidecar
# Look for connection errors to Consul

Service intention blocks the connection

The connection reaches the destination Envoy, but a service intention denies it. The destination Envoy returns connection refused.

Check with:

kubectl get serviceintentions
# Verify that source to destination is allowed

Destination service doesn't exist in Consul catalog

Envoy receives the redirected connection but can't find the destination service in Consul's service catalog. This happens if the destination pod didn't register properly.

Check with:

# View Consul catalog
kubectl port-forward service/consul-ui 8500:80
# Open http://localhost:8500 and check Services

Hashicorp Example - Notes

Components

graph TB
    subgraph "HashiCups Services"
        N["Nginx<br/>Port 80<br/>Edge Proxy"]
        F["Frontend<br/>Port 3000<br/>Next.js UI"]
        P["Public API<br/>Port 8080<br/>Gateway"]
        PR["Products API<br/>Port 9090<br/>Product Service"]
        PA["Payments<br/>Port 1800→8080<br/>Payment Service"]
        DB["Postgres<br/>Port 5432<br/>Database"]
    end
    
    N --> F
    N --> P
    F --> P
    P --> PR
    P --> PA
    PR --> DB
    
Loading

Notice that Payments has a port mapping: the Service listens on port 1800, but the container listens on port 8080. This indirection is common in Kubernetes, and can cause connection errors if misconfigured.

Request - User Views Products

What happens when a user visits the HashiCups homepage and sees a list of products?

sequenceDiagram
    participant Browser
    participant Nginx
    participant NE as Nginx Envoy
    participant FE as Frontend Envoy
    participant Frontend
    participant PE as PublicAPI Envoy
    participant PublicAPI
    participant PRE as ProductsAPI Envoy
    participant ProductsAPI
    participant PG as Postgres
    
    Browser->>Nginx: GET / HTTP/1.1
    Note over Nginx: nginx.conf: location /
    Nginx->>Nginx: proxy_pass frontend:3000
    
    Nginx->>NE: Connect to frontend:3000
    Note over NE: iptables redirect to 15001
    NE->>NE: Consul lookup: "frontend"
    NE->>FE: mTLS handshake
    Note over NE,FE: Certificate exchange<br/>Identity verification
    
    FE->>FE: Check Service Intention:<br/>nginx → frontend
    Note over FE: Intention: ALLOW
    FE->>Frontend: Forward to localhost:3000
    
    Frontend->>Frontend: Server-side render<br/>Need product data
    Frontend->>Frontend: Call process.env.NEXT_PUBLIC_PUBLIC_API_URL<br/>Resolves to: /api/products
    
    Note over Frontend: But wait! Frontend is server-side rendering<br/>It can't call /api on itself
    Frontend->>PE: Call http://public-api:8080/products
    Note over Frontend,PE: Via transparent proxy
    
    PE->>PE: Check Service Intention:<br/>frontend → public-api
    Note over PE: Intention: ALLOW
    PE->>PublicAPI: Forward to localhost:8080
    
    PublicAPI->>PublicAPI: os.Getenv("PRODUCT_API_URI")<br/>= "http://products-api:9090"
    PublicAPI->>PRE: GET http://products-api:9090/products
    
    PRE->>PRE: Check Service Intention:<br/>public-api → products-api
    Note over PRE: Intention: ALLOW
    PRE->>ProductsAPI: Forward to localhost:9090
    
    ProductsAPI->>ProductsAPI: Read CONFIG_FILE<br/>= "/config/conf.json"
    ProductsAPI->>ProductsAPI: db_connection = "host=postgres..."
    ProductsAPI->>PG: SQL Query via service mesh
    
    PG-->>ProductsAPI: Result rows
    ProductsAPI-->>PRE: JSON response
    PRE-->>PE: Encrypted response
    PE-->>PublicAPI: Decrypted
    PublicAPI-->>PE: Processed JSON
    PE-->>Frontend: Encrypted
    Frontend-->>Frontend: Render HTML with products
    Frontend-->>FE: HTML response
    FE-->>NE: Encrypted
    NE-->>Nginx: Decrypted
    Nginx-->>Browser: HTML page
Loading

Where Connection Errors Can Occur

There are multiple points where "connection refused" can happen:

Between Nginx and Frontend:

The Nginx pod has an Envoy sidecar that needs to connect to the Frontend pod's Envoy sidecar. If the frontend pod isn't running, or if the frontend ServiceAccount doesn't exist, or if Consul hasn't registered the frontend service, Nginx's Envoy can't establish the connection.

IMPORTANT: if the service intention for "nginx to frontend" doesn't exist, the frontend's Envoy will refuse the connection even though both pods are healthy.

Between Frontend and Public API:

The Frontend is a Next.js application that does server-side rendering. When it needs data, it can't call relative URLs like a browser would - it needs to call the actual backend service.

The environment variable NEXT_PUBLIC_PUBLIC_API_URL="/" is misleading here. That variable is for client-side JavaScript running in the browser. Server-side code needs a real URL, which is why the Frontend needs network access to public-api.

If this connection is blocked, server-side rendering fails and users see errors.

Between Public API and Products API:

The public-api reads PRODUCT_API_URI from its environment and makes HTTP calls to that URL. The transparent proxy intercepts these calls and routes them through the service mesh.

The ConfigMap for products-api contains the database connection string:

{
  "db_connection": "host=postgres port=5432 user=postgres password=password dbname=products sslmode=disable"
}

Notice host=postgres - this is a Kubernetes service name. The transparent proxy intercepts this PostgreSQL protocol connection (TCP, not HTTP) and routes it through the mesh.


Service Mesh Configuration

Service Definitions & Protocol Declaration

Every service in the Hashicorp example has a ServiceDefaults that tells Consul how to handle it:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: products-api
spec:
  protocol: "http"

The protocol field is more important than it looks. It determines whether Consul uses Layer 7 (HTTP) or Layer 4 (TCP) proxying.

Notes:

With protocol: "http":

  • Envoy understands HTTP semantics
  • Can retry based on HTTP status codes
  • Can route based on HTTP headers or paths
  • Collects HTTP-specific metrics (like status codes, methods)
  • Can implement HTTP-level circuit breaking

With protocol: "tcp":

  • Envoy treats traffic as opaque byte streams
  • Simple connection proxying
  • No understanding of request/response boundaries
  • Basic connection-level metrics only

For postgres, the protocol is set to "tcp" because postgres doesn't speak HTTP:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: postgres
spec:
  protocol: "tcp"

This is important for debugging because if you set postgres to "http", Envoy would try to parse PostgreSQL protocol messages as HTTP, which would fail.

Service Intentions = Authorization Layer

Service intentions are where you explicitly allow service-to-service communication:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: products-api
spec:
  sources:
    - name: public-api
      action: allow
  destination:
    name: products-api

This reads as: "Services with identity 'public-api' are allowed to call services with identity 'products-api'."

IMPORTANT - WHEN this check happens:

graph LR
    subgraph "Source Pod (public-api)"
        SrcApp["Application"]
        SrcEnvoy["Envoy"]
    end
    
    subgraph "Network"
        Net["mTLS Connection"]
    end
    
    subgraph "Destination Pod (products-api)"
        DstEnvoy["Envoy"]
        DstApp["Application"]
    end
    
    SrcApp -->|"1. Call products-api:9090"| SrcEnvoy
    SrcEnvoy -->|"2. mTLS (source identity in cert)"| Net
    Net -->|"3. Receive connection"| DstEnvoy
    DstEnvoy -->|"4. Check intention"| DstEnvoy
    DstEnvoy -->|"5. Forward if allowed"| DstApp
Loading

The destination's Envoy extracts the source service identity from the mTLS certificate, then queries Consul.

It asks the question: "Is public-api allowed to call me?" If there's no matching intention (and default-deny is in effect), the connection is refused.

This is where error 111 often comes from in a correctly configured mesh: the service mesh is working exactly as designed, but you forgot to create an intention.

The Hashicorp example includes a default-deny rule:

apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceIntentions
metadata:
  name: deny-all
spec:
  destination:
    name: '*'
  sources:
    - name: '*'
      action: deny

This means unless you've explicitly allowed a connection, it's blocked. This is the zero-trust security model.

ServiceAccounts and Identity

Each pod runs with a ServiceAccount:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: products-api
automountServiceAccountToken: true

The automountServiceAccountToken: true is important. It tells Kubernetes to mount a JWT token into the pod at /var/run/secrets/kubernetes.io/serviceaccount/token.

Consul's Envoy sidecar uses this token to prove its identity to Consul. Consul then issues a SPIFFE certificate with an identity like:

spiffe://dc1/ns/default/sa/products-api

This certificate is what's presented during mTLS handshakes. The destination Envoy looks at this certificate to determine "who is calling me?" and then checks service intentions.

If the ServiceAccount doesn't exist, or if automountServiceAccountToken is false, the Envoy sidecar can't authenticate to Consul and can't get a certificate. The service mesh breaks, and you get connection errors.


Configuration Patterns

ConfigMaps vs Environment Variables

The Hashicorp Example demonstrates two different configuration patterns that affect how connection errors manifest.

Products API uses a ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: db-configmap
data:
  config: |
    {
      "db_connection": "host=postgres port=5432 user=postgres password=password dbname=products sslmode=disable",
      "bind_address": ":9090",
      "metrics_address": ":9103"
    }

This ConfigMap is mounted as a file:

volumes:
  - name: config
    configMap:
      name: db-configmap
      items:
        - key: config
          path: conf.json

volumeMounts:
  - name: config
    mountPath: /config
    readOnly: true

env:
  - name: CONFIG_FILE
    value: "/config/conf.json"

The application reads the file at startup:

configFile := os.Getenv("CONFIG_FILE")
data, err := ioutil.ReadFile(configFile)
config := parseJSON(data)

Why this matters

If the ConfigMap doesn't exist, the volume mount fails and the pod won't start. You see a clear error:

MountVolume.SetUp failed: configmap "db-configmap" not found

If the ConfigMap exists but has the wrong service name (for example: host=postgress with a typo), the application starts successfully but then fails to connect to the database. You see "connection refused" at runtime.

Public API uses environment variables:

env:
  - name: BIND_ADDRESS
    value: ":8080"
  - name: PRODUCT_API_URI
    value: "http://products-api:9090"
  - name: PAYMENT_API_URI
    value: "http://payments:1800"

The application reads these directly:

productsAPIURL := os.Getenv("PRODUCT_API_URI")
resp, err := http.Get(productsAPIURL + "/products")

Why this matters

Environment variables are injected when the pod starts. If you have the wrong value, there's no validation until the application tries to use it. A typo in the service name means connection errors at runtime.

The advantage is that you can see the configuration with kubectl describe pod - it's right there in the pod spec. With ConfigMaps, you need to separately check the ConfigMap contents.

Port Mapping in Services

The Payments service reveals an important pattern:

apiVersion: v1
kind: Service
metadata:
  name: payments
spec:
  ports:
    - name: http
      protocol: TCP
      port: 1800         # Service port
      targetPort: 8080   # Container port
apiVersion: apps/v1
kind: Deployment
spec:
  containers:
    - name: payments
      ports:
        - containerPort: 8080

The service listens on port 1800, but the container listens on port 8080. When public-api calls http://payments:1800.

Here's what happens:

  1. DNS resolves "payments" to the Service's ClusterIP
  2. Public-api's Envoy connects to ClusterIP:1800
  3. The Service's iptables rules translate this to one of the pod IPs on port 8080
  4. The destination Envoy receives the connection on port 8080
  5. The destination Envoy forwards to the payments container on localhost:8080

NOTES

If you configure the Service with port: 1800 and targetPort: 1800, but the container is listening on 8080, connections fail. The Service successfully routes to the pod IP, but nothing is listening on port 1800 inside the pod.

You see "connection refused" (error 111) because the connection reaches the correct pod, but the port is wrong.

This is confusing with a service mesh because the error might come from Envoy (if Envoy can't connect to the local application) OR from the application (if the ports are misaligned elsewhere).


Debugging/Troubleshooting Connection Failures

Approach

When you see "delayed connect error: 111":

Is the pod running?

kubectl get pods

If the pod is in CrashLoopBackOff or Pending, fix that first. Connection errors are secondary to the pod not running.

Are all containers running?

kubectl get pod products-api-xxx -o jsonpath='{.spec.containers[*].name}'
# Should show: products-api, consul-connect-envoy-sidecar

kubectl get pod products-api-xxx -o jsonpath='{.status.containerStatuses[*].ready}'
# Should show something like: true true

If the Envoy sidecar isn't running, the service mesh is broken.

Check why (fill in the placeholders):

kubectl describe pod products-api-xxx
kubectl logs products-api-xxx -c consul-connect-envoy-sidecar

Is the application listening on the right port?

kubectl exec -it products-api-xxx -c products-api -- netstat -tlnp

You should see something like:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 0.0.0.0:9090            0.0.0.0:*               LISTEN      1/products-api

If you see 127.0.0.1:9090 instead of 0.0.0.0:9090, the application is only listening on localhost and can't receive connections from the network.

Can you reach the service from within the pod?

kubectl exec -it products-api-xxx -c products-api -- wget -O- http://localhost:9090/health

This bypasses the service mesh entirely. If this fails, the application itself has a problem.

Is DNS working?

kubectl exec -it public-api-xxx -c public-api -- nslookup products-api

Should return the Service's ClusterIP. If DNS fails, the problem is in the cluster's DNS configuration, not the service mesh.

Do service intentions allow the connection?

kubectl get serviceintentions
kubectl describe serviceintention products-api

Check that the source service is listed in the sources section with action: allow.

Is Consul healthy?

kubectl port-forward service/consul-ui 8500:80
# Open http://localhost:8500

In the Consul UI:

  • Services tab: Is the service registered?
  • Intentions tab: Are the intentions correctly configured?
  • Nodes tab: Are all Consul agents healthy?

Envoy Logs

Envoy logs are confusing, but they're the most direct way to understand what's happening:

kubectl logs products-api-xxx -c consul-connect-envoy-sidecar

Key things:

Upstream connection refused:

upstream connect error or disconnect/reset before headers. reset reason: connection failure

This means Envoy successfully intercepted the outbound connection, looked up the destination in Consul, and tried to connect to the destination's Envoy, but the connection was refused. This usually means:

  • The destination pod isn't running
  • The destination Envoy isn't running
  • A service intention blocks the connection

Unknown cluster:

no cluster match for URL

Envoy doesn't know about the destination service. This means:

  • The service isn't registered in Consul
  • The Envoy configuration hasn't synced from Consul
  • You're calling a service name that doesn't exist

Certificate validation failed:

TLS error: certificate verify failed

The mTLS handshake failed. This usually means:

  • Certificate has expired
  • Clocks are out of sync between pods
  • Consul CA is misconfigured

Checking Envoy Configuration

Envoy exposes an admin interface on port 19000:

kubectl port-forward products-api-xxx 19000:19000

Visit http://localhost:19000 in your browser. Key endpoints:

Config dump: http://localhost:19000/config_dump

Shows the complete Envoy configuration. Look for:

  • Clusters: The upstream services Envoy knows about
  • Listeners: The ports Envoy is listening on
  • Routes: How Envoy routes requests

Stats: http://localhost:19000/stats

Shows real-time statistics.

  • cluster.<service-name>.upstream_cx_connect_fail: Connection failures to this upstream
  • cluster.<service-name>.upstream_cx_active: Active connections
  • listener.0.0.0.0_15001.downstream_cx_total: Total connections received

If upstream_cx_connect_fail is non-zero, Envoy is trying to connect but failing.


Nginx Edge Proxy

NOTES

Nginx sits at the edge of the HashiCups application, handling all incoming traffic from external clients.

graph TB
    Browser["Browser<br/>(External)"]
    
    subgraph "Kubernetes Cluster"
        subgraph "Nginx Pod"
            NC["Nginx Container<br/>Port 80"]
            NE["Envoy Sidecar"]
        end
        
        subgraph "Frontend Pod"
            FC["Frontend Container<br/>Port 3000"]
            FE["Envoy Sidecar"]
        end
        
        subgraph "Public API Pod"
            PC["Public API Container<br/>Port 8080"]
            PE["Envoy Sidecar"]
        end
    end
    
    Browser -->|"HTTP (no encryption)"| NC
    NC -->|"Call frontend:3000"| NE
    NE <-->|"mTLS"| FE
    FE --> FC
    
    NC -->|"Call public-api:8080"| NE
    NE <-->|"mTLS"| PE
    PE --> PC
Loading

Nginx caches static assets and reduces load on the frontend service. The cache configuration uses content-hashed filenames from Nextjs, which means cached content never needs invalidation - when the file changes, the filename changes.

Nginx routes traffic based on URL path. Requests to /api go to public-api, everything else goes to the frontend. This provides a single origin for the browser, avoiding CORS issues.

Nginx compresses responses with gzip, reducing bandwidth usage. This is important for JavaScript bundles, which can be large.

Nginx handles HTTP/1.1 and WebSocket upgrades, ensuring compatibility with various client types.

TAKEAWAY MESSSAGE - Nginx's traffic to backend services goes through the service mesh. Nginx itself has an Envoy sidecar, and its connections to frontend and public-api are encrypted with mTLS and subject to service intentions.

Nginx Configuration

The Nginx configuration is stored in a ConfigMap:

upstream frontend_upstream {
  server frontend:3000;
}

server {
  listen 80;
  
  location /_next/static {
    proxy_cache STATIC;
    proxy_pass http://frontend_upstream;
  }
  
  location /api {
    proxy_pass http://public-api:8080;
  }
  
  location / {
    proxy_pass http://frontend_upstream;
  }
}

proxy_pass:

When Nginx executes proxy_pass http://frontend:3000, it's making a regular TCP connection to the hostname "frontend" on port 3000. This triggers:

  1. DNS resolution: "frontend" to Service ClusterIP
  2. The Nginx process opens a TCP socket to that IP
  3. Transparent proxy intercepts: iptables redirects to Envoy
  4. Nginx's Envoy establishes mTLS to frontend's Envoy
  5. Service intention check: "nginx-to-frontend"
  6. If allowed, connection proceeds

mistakes with caching:

The cache path is set to /var/cache/nginx, which is inside the container. This directory is EPHEMERAL - when the pod restarts, the cache is lost. In production, you'd mount a PersistentVolume here to preserve the cache across restarts.

Why separate upstreams:

The upstream frontend_upstream block groups the backend servers. In this case, it's just one server, but you could add multiple servers for load balancing:

upstream frontend_upstream {
  server frontend-v1:3000 weight=3;
  server frontend-v2:3000 weight=1;
}

This would send 75% of traffic to v1 and 25% to v2. This is useful for canary deployments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment