Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save jriguera/a641c67869702e1b0e26cef3ad633a6f to your computer and use it in GitHub Desktop.
Save jriguera/a641c67869702e1b0e26cef3ad633a6f to your computer and use it in GitHub Desktop.
Application Instance Identity and Intro to Envoy in PCF

Application Instance Identity and Intro to Envoy in PCF

(the content below is heavily borrowed from Eric Malm's blog post on application identity and Aaron Hurley's CFSummit talk on upcoming changes to routing tier in CF)

Timeline

PCF 1.12 - Enabled Application Instance Identity

Cloud Foundry issues a unique certificate for each running app instance. This mechanism encodes the identity of the application instance on the platform in several different ways. Further, the certificate is valid for only 24 hours. The platform regenerates it, and replaces it, in the app instance filesystem automatically, shortly before it expires. So if any other service trusts PCF’s certificate authority, it is then set up to authenticate the application instances running on it, and then to authorize them based on the application metadata. This pervasive availability of this strong security fundamental allows both the platform to become more secure by default and to make it easy for your applications to do the same.

PCF 2.0 - Secure Service-Instance Credential Delivery

service brokers can deliver service-instance credentials to applications through the CredHub component, instead of passing them back to Cloud Controller in the service-binding response. This is an advantage, as it helps your applications comply with regulations or internal audits

PCF 2.1 - Improved Routing Security and Resilience with Envoy

ensures that the routers always connect to the app instance they intend to, and that they encrypt the traffic with TLS all the way to the app container itself.

References : https://docs.pivotal.io/pivotalcf/2-2/devguide/deploy-apps/instance-identity.html

PAS UI setting :

Corresponding property in cf yaml :

  - name: rep
    release: diego
    consumes: {}
    provides: {}
    properties:
      containers:
        proxy:
          enabled: true <---
          additional_memory_allocation_mb: 32
        trusted_ca_certificates:
        - "((/cf/diego-instance-identity-root-ca.certificate))"
        - |
          -----BEGIN CERTIFICATE-----
          MIIDUTCCAjmgAwIBAgIVAIGoikVSbjpQwLYyjgjpo9OB0FTGMA0GCSqGSIb3DQEB
          CwUAMB8xCzAJBgNVBAYTAlVTMRAwDgYDVQQKDAdQaXZvdGFsMB4XDTE4MDgwNjE4
          NDc0N1oXDTIyMDgwNzE4NDc0N1owHzELMAkGA1UEBhMCVVMxEDAOBgNVBAoMB1Bp
          dm90YWwwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwA

rep config on diego cell

compute/33877ada-710a-41fb-a215-f035d762ac4a:/var/vcap/jobs/rep/config# cat rep.json | jq .|grep proxy
  "enable_container_proxy": true,
  "proxy_memory_allocation_mb": 32,
  "container_proxy_path": "/var/vcap/packages/proxy",
  "container_proxy_config_path": "/var/vcap/data/rep/proxy_config",
compute/33877ada-710a-41fb-a215-f035d762ac4a:/var/vcap/jobs/rep/config# ls -l /var/vcap/data/rep/proxy_config/b1d4075d-c7e2-4ea9-572c-db9c/
total 16
-rw-r--r-- 1 vcap vcap   606 Aug 24 14:25 envoy.yaml
-rw-r--r-- 1 vcap vcap 11493 Aug 28 12:25 listeners.yaml

From OSS docs how to enable instance identity : https://docs.cloudfoundry.org/adminguide/instance-identity.html

With the introduction of Envoy, there is a change in how data path (traffic to the app) is wired inside of Cloud Foundry.

Before Envoy

  1. The app process listens on port 8080 inside its container.

  2. The Diego cell forwards traffic from port 61080 on the host to container port 8080.

  3. The Diego cell registers its 10.0.0.5 IP and the 61080 host port with the router as an backend for the example.com domain.

  4. The router receives an HTTP request for example.com

  5. The router connects to the 10.0.0.5:61080 address and forwards the request.

  6. The Diego cell forwards the request packets to the app in its container, which handles the request.

This routing process requires the router's registrations to be up to date, though. If the system fails to update them, the router can misroute a request to an app instance that no longer exists, or, even worse, to a completely different app instance. To defend against this possibility, the routers expect the cells to broadcast the route registrations for their apps frequently. The routers then intentionally discard registrations that haven't been updated in the last 120 seconds. Cloud Foundry rightly prioritizes security over availability.

After Envoy

  1. The app process listens on port 8080 inside its container.

  2. Envoy listens on port 8443 inside the container, terminates TLS with the instance credentials that contain the instance ID a7c, and forwards that traffic to port 8080.

  3. The Diego cell forwards traffic from port 61443 on the host to container port 8443.

  4. The Diego cell registers its 10.0.0.5 IP and the 61443 host port with the router as a TLS backend for the pivotal.io domain, along with the instance ID a7c.

  5. The router receives an HTTP request for example.com.

  6. The router connects via TLS to the 10.0.0.5:61443 address, verifies the a7c instance ID, and only then forwards the request.

  7. The Diego cell forwards the request payload to Envoy, which in turn forwards it to the app itself for processing.

Now if the router connects to the wrong app instance because of a route registration that is out of date, its TLS handshake fails, and it backs out and tries a different instance. As a result, the routers also no longer need to drop out-of-date TLS registrations so aggressively. The routers can maintain app availability during extended failures of the route-registration system.

What about when the instance certificates expire? In that case, because the Diego cell already knows it has issued new credentials, it also uses Envoy's dynamic configuration capabilities to update the credentials there as well. On subsequent connections, Envoy then uses the new set of credentials for TLS termination without skipping a beat.

Peek inside how the "data path" is wired

Using the instructions here, connect to the app-instance/container

  1. Check for listening ports :
oot@7cdd109d-2feb-43eb-6c00-720d:/# netstat -anp | grep LISTEN
tcp        0      0 0.0.0.0:61001           0.0.0.0:*               LISTEN      86/envoy
tcp        0      0 0.0.0.0:61002           0.0.0.0:*               LISTEN      86/envoy
tcp        0      0 127.0.0.1:61004         0.0.0.0:*               LISTEN      86/envoy
tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN      -
tcp        0      0 0.0.0.0:8080            0.0.0.0:*               LISTEN      -
  1. the app listens or port 8080, the corresponding port configured on the envoy side is 61001. port 2222 is used for ssh and the corresponding port on the envoy isde is 61002

  2. port 61004 is the port for the envoy api

  3. envoy is configured using the following configuration file /etc/cf-assets/envoy_config/envoy.yaml:

admin:
  access_log_path: /dev/null
  address:
    socket_address:
      address: 127.0.0.1
      port_value: 61004
static_resources:
  clusters:
  - name: 0-service-cluster
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    hosts:
    - socket_address:
        address: 10.255.247.3
        port_value: 8080 <---
  - name: 1-service-cluster
    connect_timeout: 0.25s
    type: STATIC
    lb_policy: ROUND_ROBIN
    hosts:
    - socket_address:
        address: 10.255.247.3
        port_value: 2222 <---
dynamic_resources:
  lds_config:
    path: /etc/cf-assets/envoy_config/listeners.yaml
  1. echo $CF_INSTANCE_PORTS to list the corresponding envoy ports

  2. Check for envoy processes inside of the app container

root@7cdd109d-2feb-43eb-6c00-720d:/# ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 23:25 ?        00:00:00 /tmp/garden-init
vcap          13       0  0 23:25 ?        00:00:00 web
vcap          16       0  0 23:25 ?        00:00:00 /tmp/lifecycle/diego-sshd --allowedKeyExchanges= --address=0.0.0.0:2222 --allowUnauthenticatedClients=fa
root          55       0  0 23:25 ?        00:00:00 sh -c trap 'kill -9 0' TERM; /etc/cf-assets/envoy/envoy -c /etc/cf-assets/envoy_config/envoy.yaml --serv
root          86      55  0 23:25 ?        00:00:01 /etc/cf-assets/envoy/envoy -c /etc/cf-assets/envoy_config/envoy.yaml --service-cluster proxy-cluster --s
root          93       0  0 23:25 ?        00:00:00 /etc/cf-assets/healthcheck/healthcheck -port=8080 -timeout=1000ms -liveness-interval=30s
root         144       0  1 23:50 pts/0    00:00:00 /bin/bash
  1. iptables configuration for DNAT to the container IP on the diego cell :
iptables -t nat -nL | grep 61001
DNAT       tcp  --  0.0.0.0/0            10.193.68.33         tcp dpt:61079 to:10.255.247.28:61001 <---
DNAT       tcp  --  0.0.0.0/0            10.193.68.33         tcp dpt:61065 to:10.255.247.25:61001
DNAT       tcp  --  0.0.0.0/0            10.193.68.33         tcp dpt:61012 to:10.255.247.8:61001

Note the IP address of the app container to find the corresponding DNAT rule for incoming traffic to the app via the diego cell

Envoy

What is Envoy?

Created at Lyft, Envoy is a high performance open source service mesh proxy that makes the network transparent to apps.

  • written in C++
  • minimal cpu and mem footprint
  • api driven config (dynamic configuration via Discovery Services xDS APIs)
  • L4 (TCP) proxy
  • bidirectional transparent proxy
  • sidecar and ingress models (CF uses the ingress model) (see reference here for details)

Reference : https://www.cncf.io/wp-content/uploads/2018/05/projectFAQ_envoy.pdf

Cloud Foundry has adopted Envoy to provide proxy to implement better security as described above

Envoy Components

Listeners - which port it is listening to, which protocol should I be using (LDS)

Routes - Listeners are mapped to routes - routes tell envoy where traffic to be sent, like a matcher like a host header, route discovery

Clusters - routes then point to a cluster, clusters tell envoy how to send traffic, tells whether to use TLS or LB strategy, cluster discovery service

Endpoints - hosts that are able to recieve the traffic. Endpoint discovery service

envoy configuartion yaml mapping

listeners.yaml file

root@b1d4075d-c7e2-4ea9-572c-db9c:/etc/cf-assets/envoy_config# cat listeners.yaml
version_info: "0"
resources:
- '@type': type.googleapis.com/envoy.api.v2.Listener
  name: listener-8080
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 61001
  filter_chains:
  - filters:
    - name: envoy.tcp_proxy
      config:
        stat_prefix: 0-stats
        cluster: 0-service-cluster
    tls_context:
      common_tls_context:
        tls_certificates:
        - certificate_chain:
            inline_string: |
              -----BEGIN CERTIFICATE-----
             
              -----END CERTIFICATE-----
              -----BEGIN CERTIFICATE-----
              
              -----END CERTIFICATE-----
          private_key:
            inline_string: |
              -----BEGIN RSA PRIVATE KEY-----
              
              -----END RSA PRIVATE KEY-----
        tls_params:
          cipher_suites: '[ECDHE-RSA-AES256-GCM-SHA384|ECDHE-RSA-AES128-GCM-SHA256]'
- '@type': type.googleapis.com/envoy.api.v2.Listener
  name: listener-2222
  address:
    socket_address:
      address: 0.0.0.0
      port_value: 61002
  filter_chains:
  - filters:
    - name: envoy.tcp_proxy
      config:
        stat_prefix: 1-stats
        cluster: 1-service-cluster
    tls_context:
      common_tls_context:
        tls_certificates:
        - certificate_chain:
            inline_string: |
              -----BEGIN CERTIFICATE-----
              
              -----END CERTIFICATE-----
              -----BEGIN CERTIFICATE-----
              
              -----END CERTIFICATE-----
          private_key:
            inline_string: |
              -----BEGIN RSA PRIVATE KEY-----
            
              -----END RSA PRIVATE KEY-----
        tls_params:
          cipher_suites: '[ECDHE-RSA-AES256-GCM-SHA384|ECDHE-RSA-AES128-GCM-SHA256]'

routes are what will match on your header info - it will point to a cluster. cluster will have routing config and group of end points - upstream IP/hosts that will handle the traffic

How is the envoy configuration and binaries injected?

This is done via bindmount when the container is created

PCF 2.3 - "Strict" Route Integrity

Strict route integrity is implemented in 2.3. The UI for ERT now has 3 radio buttons instead of the checkbox from 2.2/2.1

All of the new properties that enable this feature are in the diego release spec here

Reference documentation around this feature is here to prevent misrouting.

Some caveats documented in the PCF 2.3 Breaking Changes Section

Additional tips from Dan Lynch :

  • In PCF 2.3 when this feature is enabled, cf curl /v2/apps/GUID/stas will no longer return a port for the app. This is because cloud controller is looking for container port which is null and not the new envoy TLS port

  • Alternatively, use cfdot instead :

# diego_cell/ebadf72b-8d0e-4e25-beb7-ef632a8aecd9:~# cfdot actual-lrp-groups | jq 'select(.instance.process_guid | contains("93d05447-6319-4a54-8e62-6228daef1768"))' | jq '[.instance.address, .instance.ports[0].host_tls_proxy_port] | "https://\(.[0]):\(.[1])"'


"https://10.193.79.34:61000"
  • Curling an app directly from the cell will not work as the envoy tls proxy will only trust requests coming from the gorouter via mutual TLS.

  • Here is a procedure to curl an app container directly when "strict route integrity" is enabled.

    • Extract the router cert and key from /var/vcap/jobs/gorouter/config/gorouter.yml from these paramaters and write them to a file

      • backends.cert_chain
      • backends.private_key
    • get the port of the app using a combination of cf api and cfdot. Make sure to use the correct app GUID. Also you might want to select an instance port other than 0

diego_cell/ebadf72b-8d0e-4e25-beb7-ef632a8aecd9:~# cfdot actual-lrp-groups | jq 'select(.instance.process_guid | contains("93d05447-6319-4a54-8e62-6228daef1768"))' | jq '[.instance.address, .instance.ports[0].host_tls_proxy_port] | "https://\(.[0]):\(.[1])"'
"https://10.193.79.34:61000"
  • using the router cert and key query the app, run the following curl command from the gorouter.
curl -H "Host: jdoe-spring-music.cfapps-14.haas-59.pez.pivotal.io" https://10.193.79.34:61000 -vvv --cert /tmp/cert --key /tmp/key -k

Resources

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment