Application Observability in Kubernetes with Datadog APM and Logging - A simple and actionable example
Last year I shared an example on how to realize application tracing in Kuberntes with Istio and Jaeger. After that, the industry has made some substantial headway on this front and we are seeing more vendor support as a result. At Buffer, since we primarily use Datadog for Kubernetes and application monitoring, it's only fitting to complete the circle with Datadog APM and Logging. I had a chance to create a small example for the team and would very much love to share with the community.
Okay, without further ado, let's dive in!
First thing first, in order to collect metrics and logs from Kubernetes an Datadog agent has to be installed in the cluster. The Datadog team made this quite easy for us. There is not much more than following this guide. I'd recommend deploying as a DaemonSet because that's all we need. The host level deployment is for something else outside of the scope of this post. If you really want to know, it monitors more metrics on the cluster level using kube_state_metrcis
.
Since we will need both APM and Logging to be enabled, there are 2 environment variables need setting on the DaemonSet.
Under the container environment variables section, add this one
- name: DD_APM_ENABLED
value: 'true'
Similar to APM, we will need to turn on the flag to tell the Datadog agent to capture logs
- name: DD_LOGS_ENABLED
value: 'true'
Super easy so far yeah? Let's try to keep it that way! Now we can safely assume the Datadog agent will do its job. Let's move to the application instrumentation.
I have a dream that one day this major step could be completely skipped. Imagine if there is a way for a monitoring agent to tap into another runtime without needing to worry about security. Unfortunately that's not quite possible for now despite things have already improved A LOT. In this example I thrive to provide the simplest way to get things started. That's my promise to you.
Now, let's take a look at the code (in node.js)
require('dd-trace').init({
hostname: process.env.DD_AGENT_HOST,
port: 8126,
env: 'development',
logInjection: true,
analytics: true,
});
const { createLogger, format, transports } = require('winston');
const addAppNameFormat = format(info => {
info.ddtags = {'logging-mvp': 'dd-tracing-logging-examples-nodejs'};
return info;
});
const logger = createLogger({
level: 'info',
exitOnError: false,
format: format.combine(
addAppNameFormat(),
format.json(),
),
transports: [
new transports.Console(),
],
});
const express = require('express');
var app = express();
app.get('/', function (req, res) {
logger.log('info', 'A simple log that works with Datadog APM tracing and logging!');
res.send('Hello world, this will demo Datadog tracing and logging!');
});
app.listen(3000, function () {
console.log('Example app listening on port 3000!');
});
That's all we need! Line 1-7 tells the dd-trace
package to send traces to the Datadog agent currently installed on the host (process.env.DD_AGENT_HOST
) on port 8126. I will show you how to set up the environment variable shortly in a later section. Kudos to you who spotted this!
Line 5 is quite magical in my opinion. The flag will combine a trace with associated logs generated during its execution. They will be represented nicely on the Datadog interface. With this, we will be able to diagnose a lot more effectively.
Line 6 will put traces to analytics so they can be tagged, searched for convenience.
That's it for the APM part. I hope it's simple enough. Now let's move on to the logging.
Guess what? No instrumentation is needed at all. Just use a logger of your choice (I used Winston) and the Datadog agent happily picks things up automatically. In my example I didn't even log things on the file level. All I had to do was from Line 10 - 13 to create a format that adds a tag to each log. This will help us to filter logs in interest much more easily.
So far so good? Now let's talk about the actual deployment to Kubernetes.
I love simplicity. I hope we are still on the same page. The Kubernetes deployment file is also very straightforward.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: dd-tracing-logging-examples-nodejs
name: dd-tracing-logging-examples-nodejs
namespace: dev
spec:
replicas: 1
template:
metadata:
labels:
app: dd-tracing-logging-examples-nodejs
spec:
containers:
- env:
- name: DD_AGENT_HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
name: dd-tracing-logging-examples-nodejs
image: bufferapp/dd-tracing-logging-examples:nodejs
imagePullPolicy: Always
ports:
- containerPort: 3000
protocol: TCP
resources:
limits:
cpu: 500m
memory: 512Mi
If you still remember the environment variable that tells where dd-trace
should send metrics to. Here is where the magic happens. Line 17 - 21 sets the host IP as an environment variable that could be used by the application.
Something I learned is Datadog APM does not always generate a trace for each request. Instead, requests are sampled. Precisely because of this, we cannot expect hitting an endpoint for a few times to see traces generated. This however can be worked around if we use a simple benchmark tool. In here let's use Apache Benchmark
ab -n 100 -c 2 -l <YOUR ENDPOINT>
This will throw 100 requests to the application to ensure some traces will be generated. In my experience, around 6 - 10 traces will be generated as a result. That's quite enough for a PoC.
Now, with everything set up correctly. Let's see what we will see on the Datadog UI.
http://hi.buffer.com/c50fb5dbee9c
Cool! The logs are pouring in. Let's take a look at the traces!
http://hi.buffer.com/a56aacca0324
Awesome! They are here too. And the best part is they are already associated with the logs generated during the runtime. That's it! Nice and easy!
http://hi.buffer.com/9977756e4ab6
We all love Kubernetes for its ability on application and resource management. However, without a good application observability it may hurt developer velocity, reducing it to a buzzword at best. It might even add unnecessary stress on both DevOps and Product engineering teams. Fortunately as the ecosystem matures we are seeing more vendor support, thus gave birth to this post. I'm super pumped for the ongoing trajectory.
You may find the complete source code for this post in https://github.com/bufferapp/dd-tracing-logging-examples. Cheers!