There are three major features of Rancher 2.0 monitoring support.
- Enbale Prometheus as Rancher 2.0 monitoring component. And we will have some real-time monitoring graphs in Rancher UI.
- Support Workloads auto scaling. We will support two kinds of scaler. The Kubernetes HPA and webhook scaler.
- Support more alert rule types. Such as pod CPU/Memory usage.
There is a new page under Tools
in cluster level call Monitoring
that cluster admin can enbale Monitoring for the whole cluster. At this page, he can configure the following options:
- Prometheus deploy options
- Enable/Disable PVC - Advanced Option Default to disabled, it shows pvc settings when enabled.
- Retention policy for Prometheus - Advanced Option Default to 15d(d for days)
- Enable/disable Prometheus metrics server adapter for Kubernetes. This metrics server will fetch metrics from prometheus and push them to Kubernetes metrics API. After it pushed custom metrics to Kubernetes, users can use them for HPA(custom metrics for HPA).
After enabled monitoring, the monitoring deploy controller will deploy following promethues server
, node exporter
and prometheus metrics server
(if enbaled) into cattle-alerting
namespace.
After deployed, following metrics type will be fetch from cluster.
- API server metrics - provided by API server
- kubelet metrics - provided by kubelet
- pod runtime metrics like CPU/Memory/Network/Disk - provided by cAdviser in kubelet
- node runtime metrics like CPU/Memory/Network/Disk - provided by node exporter
And after enabled monitoring, following real-time monitoring graphs will be shown in Rancher UI when monitoring is enabled.
- The node CPU/Mem/Network/Disk IO in cluster node detail page.
- The pod CPU/Mem/Network/Disk IO in pod detail page.
About project level monitoring:
There are some upside for project level monitoring:
- Project users can expose their app metrics to project level prometheus. They can build their own fancy graphs with grafana point to this prometheus.
- They don't need to know anything from cluster level montoring.
Also some problem for it:
- We can not do alert manager aggregation in project level because alert manager is deployed into
cattle-alerting
namespace with non-project. There will be isolation rules for user project and non-project.
Horizontal Pod Autoscaler
We will have a new page in project
level under Resource
menu to manag HPA. We can create a HPA with following options:
- name
- namespace
- scale workload target - Only Deployment, RC and RS are supported
- min replicas
- max replicas
- metrics - it means that if there is any metrics target has been meet, it will scale up your workload.
For now, we could have following metrics types:
- Pods - The metrics related to the pod. Like cpu, memory or other custom metrics. The custom metrics could be fetched by our prometheus metrics server.
- Resources - Native k8s can support cpu resource metrics. It has
currentAverageUtilization
andcurrentAverageValue
metrics. - Object - This refer to the metrics from other resources in your namespaces. This is an example for metrics:
...
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 50
- type: Pods
pods:
metricName: packets-per-second
targetAverageValue: 1k
- type: Object
object:
metricName: requests-per-second
target:
apiVersion: extensions/v1beta1
kind: Ingress
name: main-route
targetValue: 10k
...
Webhook Scaler
We will support webhook to trigger to scale workload and cluster node.
We will have a new page Webhook
in project
level menu to manage workload scale webhooks. We can set following options to webhook objects:
- We can select a workload target or a label selector for webhook. Only Deployment, RC, RS and StatefulSet are supported
- Both scale-up and scale-down will be supported.
- Scaling resource count - default to 1
- Minimun and Maximum replicas - default to 1/10
We will have a new page Webhook
in cluster
level menu to manage node scale webhooks. We can set following options to webhook objects:
- Users can select a node pool to scale node.
- Only scale-up will be supported.
- User can set labels for those nodes.
- Only support non-imported and non-custom cluster.
- Scaling resource count - default to 1
- Minimun and Maximum replicas - default to 1/10
And after we create a webhook, we will create a receiver link for this webhook. The webhook function will be triggered by a Http Post request to this link.
With variours metrics collected by Prometheus, we can provide more built-in rules for user to configure, so that users do not have to write the PromSQL themselves. Following Alert rules will be added after Montoring is enabled:
- Alert for CPU/Mem/Network/Disk IO usage of workload
- Alert for Network/Disk IO/System Load usage of node
- Custom rules. User can add custom rule by providing the
PromSQL
. In custom rules, we can support all the metrics collected by prometheus.
Custom Rule Sample
sum (rate (container_network_receive_bytes_total{kubernetes_io_hostname=~"^worker1"}[2m]))
TODO