my Helmfile definition is
context: taurus-stage.kube.swatmobile.io
releases:
- name: datadog
namespace: kube-system
chart: stable/datadog
version: 0.11.2
values:
- datadog/values.yaml
secrets:
- datadog/secrets.yaml
datadog/secrets.yaml
contains the api keys, so I'm not putting it here
datadog/values.yaml
contains:
image:
repository: datadog/agent # Agent6
tag: 6.3.0 # Use 6.3.0-jmx to enable jmx fetch collection
pullPolicy: IfNotPresent
daemonset:
enabled: true
updateStrategy: RollingUpdate
tolerations:
- key: "node-role.kubernetes.io/master"
effect: NoSchedule
deployment:
enabled: true
# datadog.collectEvents requires datadog.leaderElection enabled (which also ensures proper RBAC)
replicas: 1
kubeStateMetrics:
enabled: false # deployed separately to decouple dependency
# also, chart service-name is non-default and cumbersome
datadog:
name: datadog-agent
nonLocalTraffic: true
apmEnabled: true # APM also known as trace agent
leaderElection: true
collectEvents: true
## All datadog configuration: https://sourcegraph.com/github.com/DataDog/[email protected]/-/blob/pkg/config/config.go#L60:2
env:
- name: DD_CHECK_RUNNERS # Agent6: default 1, increase if collector_queue fails health checks due to high number of checks
value: "1"
- name: DD_KUBERNETES_POD_LABELS_AS_TAGS # Agent6: needs you to whilelist relevant labels
value: '{"app":"helm_app","release":"helm_release","component":"helm_component","k8s-app":"k8s-app","chart":"helm_chart","heritage":"helm_heritage"}'
- name: DD_KUBERNETES_NODE_LABELS_AS_TAGS
value: '{"kubernetes.io/hostname":"node_name","beta.kubernetes.io/os":"node_os","beta.kubernetes.io/instance-type":"node_type","kubernetes.io/role":"node_role","failure-domain.beta.kubernetes.io/region":"node_region","failure-domain.beta.kubernetes.io/zone":"node_zone"}'
- name: DD_PROCESS_AGENT_ENABLED # Agent6: process monitoring - https://docs.datadoghq.com/graphing/infrastructure/process/
value: "true"
- name: DD_LOGS_ENABLED # Agent6: enable DataDog logs
value: "true"
- name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
value: "true"
## required for process monitoring
volumes:
- hostPath:
path: /etc/passwd
name: passwd
- hostPath:
path: /opt/datadog-agent/run # Logs: stores the last line collected for each container
name: opt-ddog-run
volumeMounts:
- name: passwd
mountPath: /etc/passwd
readOnly: true
- name: opt-ddog-run
mountPath: /opt/datadog-agent/run
readOnly: false
tags: "cluster:taurus-staging, cluster_group:staging"
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 200m
memory: 256Mi
More details about the Helm chart can be found here: https://hub.kubeapps.com/charts/stable/datadog
I'm not using the latest chart version, but I went to the diff and there's nothing new in the chart which I'm not doing already with current version
I think to only get the logs for a limited stack (we refer to this stack as "panama") I'd just need to add this to vaules:
datadog:
...
# ref: https://github.com/DataDog/docker-dd-agent#configuration-files
# ref: https://docs.datadoghq.com/logs/log_collection/docker/#option-1-configuration-file
# ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
confd:
panama-dev.yaml: |-
init_config:
instances: [{}]
logs:
- type: docker
label: release:panama-dev # this is a k8s annotation, not a docker label...
source: panama
service: panama
resources:
requests:
cpu: 200m
memory: 256Mi
limits:
cpu: 200m
memory: 256Mi
and set DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
back to false
Based on feedback from DataDog support, updated config looks like: