datadog setup

my Helmfile definition is

context: taurus-stage.kube.swatmobile.io

releases:
- name: datadog
  namespace: kube-system
  chart: stable/datadog
  version: 0.11.2
  values:
  - datadog/values.yaml
  secrets:
  - datadog/secrets.yaml

datadog/secrets.yaml contains the api keys, so I'm not putting it here

datadog/values.yaml contains:

image:
  repository: datadog/agent               # Agent6
  tag: 6.3.0                              # Use 6.3.0-jmx to enable jmx fetch collection
  pullPolicy: IfNotPresent

daemonset:
  enabled: true
  updateStrategy: RollingUpdate
  tolerations:
  - key: "node-role.kubernetes.io/master"
    effect: NoSchedule

deployment:
  enabled: true
  # datadog.collectEvents requires datadog.leaderElection enabled (which also ensures proper RBAC)
  replicas: 1

kubeStateMetrics:
  enabled: false                           # deployed separately to decouple dependency
                                           # also, chart service-name is non-default and cumbersome

datadog:
  name: datadog-agent
  nonLocalTraffic: true
  apmEnabled: true                         # APM also known as trace agent
  leaderElection: true
  collectEvents: true                    

  ## All datadog configuration: https://sourcegraph.com/github.com/DataDog/[email protected]/-/blob/pkg/config/config.go#L60:2
  env:
  - name: DD_CHECK_RUNNERS                 # Agent6: default 1, increase if collector_queue fails health checks due to high number of checks
    value: "1"              
  - name: DD_KUBERNETES_POD_LABELS_AS_TAGS # Agent6: needs you to whilelist relevant labels
    value: '{"app":"helm_app","release":"helm_release","component":"helm_component","k8s-app":"k8s-app","chart":"helm_chart","heritage":"helm_heritage"}'
  - name: DD_KUBERNETES_NODE_LABELS_AS_TAGS
    value: '{"kubernetes.io/hostname":"node_name","beta.kubernetes.io/os":"node_os","beta.kubernetes.io/instance-type":"node_type","kubernetes.io/role":"node_role","failure-domain.beta.kubernetes.io/region":"node_region","failure-domain.beta.kubernetes.io/zone":"node_zone"}'
  - name: DD_PROCESS_AGENT_ENABLED         # Agent6: process monitoring - https://docs.datadoghq.com/graphing/infrastructure/process/
    value: "true"
  - name: DD_LOGS_ENABLED                  # Agent6: enable DataDog logs
    value: "true"
  - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
    value: "true"

  ## required for process monitoring
  volumes:
  - hostPath:
      path: /etc/passwd
    name: passwd
  - hostPath:
      path: /opt/datadog-agent/run  # Logs: stores the last line collected for each container 
    name: opt-ddog-run
  volumeMounts:
  - name: passwd
    mountPath: /etc/passwd
    readOnly: true
  - name: opt-ddog-run
    mountPath: /opt/datadog-agent/run
    readOnly: false

  tags: "cluster:taurus-staging, cluster_group:staging"
  
  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 200m
      memory: 256Mi

More details about the Helm chart can be found here: https://hub.kubeapps.com/charts/stable/datadog

I'm not using the latest chart version, but I went to the diff and there's nothing new in the chart which I'm not doing already with current version

I think to only get the logs for a limited stack (we refer to this stack as "panama") I'd just need to add this to vaules:

datadog:
  ...
  # ref: https://github.com/DataDog/docker-dd-agent#configuration-files
  # ref: https://docs.datadoghq.com/logs/log_collection/docker/#option-1-configuration-file
  # ref: https://docs.datadoghq.com/agent/basic_agent_usage/kubernetes/#log-collection-setup
  confd:
    panama-dev.yaml: |-
      init_config:
      instances: [{}]

      logs:
      - type: docker
        label: release:panama-dev # this is a k8s annotation, not a docker label...
        source: panama
        service: panama

  resources:
    requests:
      cpu: 200m
      memory: 256Mi
    limits:
      cpu: 200m
      memory: 256Mi

and set DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL back to false

Based on feedback from DataDog support, updated config looks like:

env:
  ...
  - name: DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL
    value: "false"

...
datadog:
  ...
  confd:
    swat_cluster.yaml: |-
      init_config:
      instances: [{}]

      logs:
      - type: docker
        image: authenticator
        source: heptio-authenticator-aws
        service: docker
      - type: docker
        image: addon-resizer
        source: go # these are standard golang logs, but aren't parsed correctly by DataDog's go integration
        service: docker
      - type: docker
        image: cluster-autoscaler
        source: go # these are standard golang logs, but aren't parsed correctly by DataDog's go integration
        service: docker
      - type: docker
        image: openvpn
        source: openvpn # set verbosity to 0 for now
        service: docker
      - type: docker
        image: skipper
        source: skipper # based on https://sourcegraph.com/github.com/zalando/[email protected]/-/blob/logging/access.go#L23
        service: docker
      - type: docker
        image: kube-ingress-aws-controller
        source: kube-ingress-aws-controller
        service: docker
      - type: docker
        image: external-dns
        source: external-dns
        service: docker

    panama-dev.yaml: |-
      init_config:
      instances: [{}]

      logs:
      - type: docker
        image: artemis:k8s-dev
        source: nginx
        service: docker
      - type: docker
        image: morpheus:dev
        source: nginx
        service: docker
      - type: docker
        image: swatsite:dev
        source: nginx
        service: docker

    panama-sparta.yaml: |-
      init_config:
      instances: [{}]

      logs:
      - type: docker
        image: artemis:k8s-sparta
        source: nginx
        service: docker
      - type: docker
        image: morpheus:sparta
        source: nginx
        service: docker
      - type: docker
        image: swatsite:sparta
        source: nginx
        service: docker

    panama-staging.yaml: |-
      init_config:
      instances: [{}]

      logs:
      - type: docker
        image: artemis:k8s-staging
        source: nginx
        service: docker
      - type: docker
        image: morpheus:staging
        source: nginx
        service: docker
      - type: docker
        image: swatsite:staging
        source: nginx
        service: docker

so0k/datadog-support.md

so0k commented Jul 17, 2018 •

edited

Loading

Uh oh!

so0k/datadog-support.md

so0k commented Jul 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

so0k commented Jul 17, 2018 •

edited

Loading