*note: working document, may not apply to all installations/architectures
Manage Certificates with Cert-Manager
Manual deployment of certificates is error prone.
Vault can also be leveraged as a certificate source.
Install Istio CNI
Installing the Istio-CNI allows for users to deploy Istio workloads without adding NET_ADMIN and NET_RAW capabilities.
The istio-cni may cause issues with init containers. Follow these steps to address these issues.
Scale Istiod
An Istiod outage can impact the data plane through validation. By default Istiod is configured with an HPA, but only one replica.
At minimum Istiod should have 2 replicas.
Increase Maximum HPA
5 is the default limit for small environments. Larger environments need a larger cap.
Change the value from 5 to 10.
Reduce HPA Target Utilization
Reducing the Average Utilization for the Istiod pod will trigger scaling events quicker, and allow for production level usage patterns.
Reducing the utilization from 80% to 60% will suffice in many situations.
Set Istiod Requests Appropriately
The default installation values of 500m-CPU and 2048Mi-Mem are for small pilot installations.
[ref] At scale Istiod uses 1vCPU and 1.5G of memory.
Disable or Diminish Trace Sampling
The default 1% sampling gathers more traces than necessary at requests/sec > 10,000
Depending on the environment, 0.1% or 0.01% may be appropriate.
Leverage Revisions and Revision Tags
Revisions and revision tags will enable effective upgrades, and less resource maintenance.
Be aware that tags and revisions have separate format requirements
Leveage Pilot Pod Anti-Affinity/Affinity
Scaling pilot multiple times on the same node is still configuring a single point of failure.
Newer feature that is not fully documented in istio/istio. ref
Disable envoyMetricsService/envoyAccessLogService
These features are primarily diagnostic and should be used outside of performance sensitive environments.
Tuning of the logs and metrics may eliminate the need to disable these.
Configure Certificate TTL
Default and Max certificate TTL should be configured to suit your TLS requirements. Set using environment variables.
Tune Pilot Debounce Settings
Reduces the burden on highly dynamic systems.
Enable Affinity/Anti-Affinity on Gateways
Same as istiod example
In heterogenous environments the network could play an additional consideration here.
Scale Gateways
The gateways are critical to the datapath and should have a higher number of minReplicas than 1.
2 is the absolute minimum, more may be necessary.
Add Gateway PreStop Patch to allow for all connections to close
In order to allow for delayed connection closures, it may be necessary to wait an additional amount of time in order to allow for connection closures.
overlays:
- apiVersion: v1
kind: Deployment
name: istio-ingressgateway-${ISTIO_REVISION}
patches:
# Sleep 25s on pod shutdown to allow connections to drain
- path: spec.template.spec.containers.[name:istio-proxy].lifecycle
value:
preStop:
exec:
command:
- sleep
- "25"
25 is an arbitrary number here. It is suggested to tune based on environment.
Upgrade to http2 wherever possible
http2 is going to be more performant than http1 in situations where it can be leveraged.
An explaination of how it can be configured is here
Tune proxy concurrency
By default the sidecar will create 2 threads. This may be leaving capacity underutilized.
Available options are 0 (to leverage all cores based on limits and requests) or a predefined number based on tuning. ref
Leverage an Envoy Filter to Balance Connections over Worker Threads
Evenly distributes work across threads.
Requirement based on the number of threads being leveraged.
...
spec:
configPatches:
- applyTo: LISTENER
match:
context: GATEWAY
listener:
portNumber: 8443
patch:
operation: MERGE
value:
connection_balance_config:
exact_balance: {}
workload:
labels:
app: ingress
...