(the content below is heavily borrowed from Eric Malm's blog post on application identity and Aaron Hurley's CFSummit talk on upcoming changes to routing tier in CF)
- Application Instance Identity and Intro to Envoy in PCF
Cloud Foundry issues a unique certificate for each running app instance. This mechanism encodes the identity of the application instance on the platform in several different ways. Further, the certificate is valid for only 24 hours. The platform regenerates it, and replaces it, in the app instance filesystem automatically, shortly before it expires. So if any other service trusts PCF’s certificate authority, it is then set up to authenticate the application instances running on it, and then to authorize them based on the application metadata. This pervasive availability of this strong security fundamental allows both the platform to become more secure by default and to make it easy for your applications to do the same.
- PCF docs - https://docs.pivotal.io/pivotalcf/1-12/installing/highlights.html#app-identity
- PCF docs - https://docs.pivotal.io/pivotalcf/1-12/security/networking/tls-info.html#container-creds
- original OSS CF proposal : https://docs.google.com/document/d/1OWrqaNEQkl8VXd8r3W6GgDEXxd3sXw5C-20dAu76HOk/
service brokers can deliver service-instance credentials to applications through the CredHub component, instead of passing them back to Cloud Controller in the service-binding response. This is an advantage, as it helps your applications comply with regulations or internal audits
- PCF docs - https://docs.pivotal.io/pivotalcf/2-1/opsguide/secure-si-creds.html
- PCF docs - https://docs.pivotal.io/spring-cloud-services/1-5/service-broker-and-instances.html#using-credhub-for-service-instance-credentials
ensures that the routers always connect to the app instance they intend to, and that they encrypt the traffic with TLS all the way to the app container itself.
References : https://docs.pivotal.io/pivotalcf/2-2/devguide/deploy-apps/instance-identity.html
PAS UI setting :
Corresponding property in cf
yaml :
- name: rep
release: diego
consumes: {}
provides: {}
properties:
containers:
proxy:
enabled: true <---
additional_memory_allocation_mb: 32
trusted_ca_certificates:
- "((/cf/diego-instance-identity-root-ca.certificate))"
- |
-----BEGIN CERTIFICATE-----
MIIDUTCCAjmgAwIBAgIVAIGoikVSbjpQwLYyjgjpo9OB0FTGMA0GCSqGSIb3DQEB
CwUAMB8xCzAJBgNVBAYTAlVTMRAwDgYDVQQKDAdQaXZvdGFsMB4XDTE4MDgwNjE4
NDc0N1oXDTIyMDgwNzE4NDc0N1owHzELMAkGA1UEBhMCVVMxEDAOBgNVBAoMB1Bp
dm90YWwwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwA
rep
config on diego cell
compute/33877ada-710a-41fb-a215-f035d762ac4a:/var/vcap/jobs/rep/config# cat rep.json | jq .|grep proxy
"enable_container_proxy": true,
"proxy_memory_allocation_mb": 32,
"container_proxy_path": "/var/vcap/packages/proxy",
"container_proxy_config_path": "/var/vcap/data/rep/proxy_config",
compute/33877ada-710a-41fb-a215-f035d762ac4a:/var/vcap/jobs/rep/config# ls -l /var/vcap/data/rep/proxy_config/b1d4075d-c7e2-4ea9-572c-db9c/
total 16
-rw-r--r-- 1 vcap vcap 606 Aug 24 14:25 envoy.yaml
-rw-r--r-- 1 vcap vcap 11493 Aug 28 12:25 listeners.yaml
From OSS docs how to enable instance identity : https://docs.cloudfoundry.org/adminguide/instance-identity.html
With the introduction of Envoy, there is a change in how data path (traffic to the app) is wired inside of Cloud Foundry.
-
The app process listens on port 8080 inside its container.
-
The Diego cell forwards traffic from port 61080 on the host to container port 8080.
-
The Diego cell registers its 10.0.0.5 IP and the 61080 host port with the router as an backend for the example.com domain.
-
The router receives an HTTP request for example.com
-
The router connects to the 10.0.0.5:61080 address and forwards the request.
-
The Diego cell forwards the request packets to the app in its container, which handles the request.
This routing process requires the router's registrations to be up to date, though. If the system fails to update them, the router can misroute a request to an app instance that no longer exists, or, even worse, to a completely different app instance. To defend against this possibility, the routers expect the cells to broadcast the route registrations for their apps frequently. The routers then intentionally discard registrations that haven't been updated in the last 120 seconds. Cloud Foundry rightly prioritizes security over availability.
-
The app process listens on port 8080 inside its container.
-
Envoy listens on port 8443 inside the container, terminates TLS with the instance credentials that contain the instance ID a7c, and forwards that traffic to port 8080.
-
The Diego cell forwards traffic from port 61443 on the host to container port 8443.
-
The Diego cell registers its 10.0.0.5 IP and the 61443 host port with the router as a TLS backend for the pivotal.io domain, along with the instance ID a7c.
-
The router receives an HTTP request for example.com.
-
The router connects via TLS to the 10.0.0.5:61443 address, verifies the a7c instance ID, and only then forwards the request.
-
The Diego cell forwards the request payload to Envoy, which in turn forwards it to the app itself for processing.
Now if the router connects to the wrong app instance because of a route registration that is out of date, its TLS handshake fails, and it backs out and tries a different instance. As a result, the routers also no longer need to drop out-of-date TLS registrations so aggressively. The routers can maintain app availability during extended failures of the route-registration system.
What about when the instance certificates expire? In that case, because the Diego cell already knows it has issued new credentials, it also uses Envoy's dynamic configuration capabilities to update the credentials there as well. On subsequent connections, Envoy then uses the new set of credentials for TLS termination without skipping a beat.
Using the instructions here, connect to the app-instance/container
- Check for listening ports :
oot@7cdd109d-2feb-43eb-6c00-720d:/# netstat -anp | grep LISTEN
tcp 0 0 0.0.0.0:61001 0.0.0.0:* LISTEN 86/envoy
tcp 0 0 0.0.0.0:61002 0.0.0.0:* LISTEN 86/envoy
tcp 0 0 127.0.0.1:61004 0.0.0.0:* LISTEN 86/envoy
tcp 0 0 0.0.0.0:2222 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN -
-
the app listens or port 8080, the corresponding port configured on the envoy side is 61001. port 2222 is used for ssh and the corresponding port on the envoy isde is 61002
-
port 61004 is the port for the envoy api
-
envoy is configured using the following configuration file
/etc/cf-assets/envoy_config/envoy.yaml
:
admin:
access_log_path: /dev/null
address:
socket_address:
address: 127.0.0.1
port_value: 61004
static_resources:
clusters:
- name: 0-service-cluster
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 10.255.247.3
port_value: 8080 <---
- name: 1-service-cluster
connect_timeout: 0.25s
type: STATIC
lb_policy: ROUND_ROBIN
hosts:
- socket_address:
address: 10.255.247.3
port_value: 2222 <---
dynamic_resources:
lds_config:
path: /etc/cf-assets/envoy_config/listeners.yaml
-
echo $CF_INSTANCE_PORTS
to list the corresponding envoy ports -
Check for
envoy
processes inside of the app container
root@7cdd109d-2feb-43eb-6c00-720d:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 23:25 ? 00:00:00 /tmp/garden-init
vcap 13 0 0 23:25 ? 00:00:00 web
vcap 16 0 0 23:25 ? 00:00:00 /tmp/lifecycle/diego-sshd --allowedKeyExchanges= --address=0.0.0.0:2222 --allowUnauthenticatedClients=fa
root 55 0 0 23:25 ? 00:00:00 sh -c trap 'kill -9 0' TERM; /etc/cf-assets/envoy/envoy -c /etc/cf-assets/envoy_config/envoy.yaml --serv
root 86 55 0 23:25 ? 00:00:01 /etc/cf-assets/envoy/envoy -c /etc/cf-assets/envoy_config/envoy.yaml --service-cluster proxy-cluster --s
root 93 0 0 23:25 ? 00:00:00 /etc/cf-assets/healthcheck/healthcheck -port=8080 -timeout=1000ms -liveness-interval=30s
root 144 0 1 23:50 pts/0 00:00:00 /bin/bash
iptables
configuration for DNAT to the container IP on the diego cell :
iptables -t nat -nL | grep 61001
DNAT tcp -- 0.0.0.0/0 10.193.68.33 tcp dpt:61079 to:10.255.247.28:61001 <---
DNAT tcp -- 0.0.0.0/0 10.193.68.33 tcp dpt:61065 to:10.255.247.25:61001
DNAT tcp -- 0.0.0.0/0 10.193.68.33 tcp dpt:61012 to:10.255.247.8:61001
Note the IP address of the app container to find the corresponding DNAT rule for incoming traffic to the app via the diego cell
What is Envoy?
Created at Lyft, Envoy is a high performance open source service mesh proxy that makes the network transparent to apps.
- written in C++
- minimal cpu and mem footprint
- api driven config (dynamic configuration via Discovery Services xDS APIs)
- L4 (TCP) proxy
- bidirectional transparent proxy
- sidecar and ingress models (CF uses the ingress model) (see reference here for details)
Reference : https://www.cncf.io/wp-content/uploads/2018/05/projectFAQ_envoy.pdf
Cloud Foundry has adopted Envoy to provide proxy to implement better security as described above
Listeners - which port it is listening to, which protocol should I be using (LDS)
Routes - Listeners are mapped to routes - routes tell envoy where traffic to be sent, like a matcher like a host header, route discovery
Clusters - routes then point to a cluster, clusters tell envoy how to send traffic, tells whether to use TLS or LB strategy, cluster discovery service
Endpoints - hosts that are able to recieve the traffic. Endpoint discovery service
listeners.yaml
file
root@b1d4075d-c7e2-4ea9-572c-db9c:/etc/cf-assets/envoy_config# cat listeners.yaml
version_info: "0"
resources:
- '@type': type.googleapis.com/envoy.api.v2.Listener
name: listener-8080
address:
socket_address:
address: 0.0.0.0
port_value: 61001
filter_chains:
- filters:
- name: envoy.tcp_proxy
config:
stat_prefix: 0-stats
cluster: 0-service-cluster
tls_context:
common_tls_context:
tls_certificates:
- certificate_chain:
inline_string: |
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
private_key:
inline_string: |
-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----
tls_params:
cipher_suites: '[ECDHE-RSA-AES256-GCM-SHA384|ECDHE-RSA-AES128-GCM-SHA256]'
- '@type': type.googleapis.com/envoy.api.v2.Listener
name: listener-2222
address:
socket_address:
address: 0.0.0.0
port_value: 61002
filter_chains:
- filters:
- name: envoy.tcp_proxy
config:
stat_prefix: 1-stats
cluster: 1-service-cluster
tls_context:
common_tls_context:
tls_certificates:
- certificate_chain:
inline_string: |
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----
private_key:
inline_string: |
-----BEGIN RSA PRIVATE KEY-----
-----END RSA PRIVATE KEY-----
tls_params:
cipher_suites: '[ECDHE-RSA-AES256-GCM-SHA384|ECDHE-RSA-AES128-GCM-SHA256]'
routes are what will match on your header info - it will point to a cluster. cluster will have routing config and group of end points - upstream IP/hosts that will handle the traffic
How is the envoy configuration and binaries injected?
This is done via bindmount when the container is created
Strict route integrity is implemented in 2.3. The UI for ERT now has 3 radio buttons instead of the checkbox from 2.2/2.1
All of the new properties that enable this feature are in the diego release spec here
Reference documentation around this feature is here to prevent misrouting.
Some caveats documented in the PCF 2.3 Breaking Changes Section
Additional tips from Dan Lynch :
-
In PCF 2.3 when this feature is enabled,
cf curl /v2/apps/GUID/stas
will no longer return a port for the app. This is because cloud controller is looking for container port which is null and not the new envoy TLS port -
Alternatively, use cfdot instead :
# diego_cell/ebadf72b-8d0e-4e25-beb7-ef632a8aecd9:~# cfdot actual-lrp-groups | jq 'select(.instance.process_guid | contains("93d05447-6319-4a54-8e62-6228daef1768"))' | jq '[.instance.address, .instance.ports[0].host_tls_proxy_port] | "https://\(.[0]):\(.[1])"'
"https://10.193.79.34:61000"
-
Curling an app directly from the cell will not work as the envoy tls proxy will only trust requests coming from the gorouter via mutual TLS.
-
Here is a procedure to curl an app container directly when "strict route integrity" is enabled.
-
Extract the router cert and key from
/var/vcap/jobs/gorouter/config/gorouter.yml
from these paramaters and write them to a file- backends.cert_chain
- backends.private_key
-
get the port of the app using a combination of cf api and cfdot. Make sure to use the correct app GUID. Also you might want to select an instance port other than 0
-
diego_cell/ebadf72b-8d0e-4e25-beb7-ef632a8aecd9:~# cfdot actual-lrp-groups | jq 'select(.instance.process_guid | contains("93d05447-6319-4a54-8e62-6228daef1768"))' | jq '[.instance.address, .instance.ports[0].host_tls_proxy_port] | "https://\(.[0]):\(.[1])"'
"https://10.193.79.34:61000"
- using the router cert and key query the app, run the following
curl
command from the gorouter.
curl -H "Host: jdoe-spring-music.cfapps-14.haas-59.pez.pivotal.io" https://10.193.79.34:61000 -vvv --cert /tmp/cert --key /tmp/key -k
- Envoy configuration : https://github.com/cloudfoundry/diego-release/blob/master/docs/envoy-proxy-configuration.md
- Sample apps : https://github.com/emalm/tls-example-apps
- Sample apps : https://github.com/nebhale/mtls-sample
- How login to an app container from diego cell (backdoor method): https://gist.github.com/nikhilsuvarna/aa2593af5e45721049e3f1c774e238c9