Steven Acreman sacreman

Monitoring Kubernetes with Prometheus + Long Term Storage

Running an online service isn't easy. Every day you make complex decisions about how to solve problems and often there is no right or wrong answer, there are just different ways with different results. On the infrastructure side you have to weigh up where everything will be hosted. Is that on a cloud service like AWS, or in your own data centres, or any number of other options, perhaps even a mix.

Monitoring choices are equally hard. There are the tools that are familiar and a known quantity, some new ones that look interesting from reading blogs, and then the option to buy one of any number of SaaS products.

Let's imagine for the sake of brevity of this blog that you are looking to move into AWS from your traditional data centre and want to upgrade from your Nagios, Graphite and StatsD stack to something a bit newer. This is actually an incredibly common scenario that we see every day.

The first decision to make is to analyse up front whether to build or buy. To properly make that decision you'll need to

aws
awsapi
metricsBrowser
aws-i

Introduction

Welcome to the Dataloop API documentation!

To use the API you'll need an api key which can be created in Dataloop under your user account settings. When integrating services you may want to consider creating an application specific user in Dataloop with access to accounts at the correct role level.

You will also need to know the organisation name and account name that you want to work with. These match the organisation and account names in Dataloop. Use these details where you see <org name> and <account name> in the examples.

	#!/bin/bash -e

	BACKUP_DIR="/var/tmp/k8sbackup/$(date +%s)"
	echo "Backing up cluster to ${BACKUP_DIR}"

	NAMESPACES=$(kubectl get ns -o jsonpath={.items[*].metadata.name})
	RESOURCETYPES="${RESOURCETYPES:-"ingress deployment configmap secret svc rc ds networkpolicy statefulset cronjob pvc"}"
	GLOBALRESOURCES="${GLOBALRESOURCES:-"namespace storageclass clusterrole clusterrolebinding customresourcedefinition"}"

	mkdir -p ${BACKUP_DIR}

	2018-10-22 08:39:01 INFO Starting test
	2018-10-22 08:39:01 INFO Creating Resource Group
	2018-10-22 08:39:03 INFO Creating the AKS cluster
	2018-10-22 08:52:22 INFO Getting cluster credentials
	2018-10-22 08:52:23 INFO Get Nodes
	2018-10-22 08:52:26 INFO b'NAME STATUS ROLES AGE VERSION\naks-nodepool1-18093422-0 Ready agent 3m v1.9.11\n'
	2018-10-22 08:52:26 INFO Applying Deployment
	2018-10-22 08:52:32 INFO b'deployment.apps/azure-vote-back created\nservice/azure-vote-back created\ndeployment.apps/azure-vote-front created\nservice/azure-vote-front created\n'
	2018-10-22 08:52:32 INFO Getting external IP
	2018-10-22 08:56:18 INFO Getting web contents from 104.41.139.254

	{
	"dashboard": {
	"title": "Hosts",
	"description": "Basic host stats: CPU, Memory Usage, Disk Utilisation, Filesystem usage and Predicted time to filesystems filling",
	"id": null,
	"rows": [{
	"collapse": false,
	"editable": true,
	"height": "250px",
	"panels": [{

	{
	"annotations": {
	"list": []
	},
	"description": "Monitors Kubernetes cluster using Prometheus. Shows overall cluster CPU / Memory / Filesystem usage as well as individual pod, containers, systemd services statistics. Uses cAdvisor metrics only.",
	"editable": true,
	"gnetId": 315,
	"graphTooltip": 0,
	"hideControls": false,
	"id": 2,

	# Prometheus configuration to scrape Kubernetes outside the cluster
	# Change master_ip and api_password to match your master server address and admin password
	global:
	scrape_interval: 15s
	evaluation_interval: 15s


	scrape_configs:
	# metrics for the prometheus server
	- job_name: 'prometheus'

	#!/usr/bin/env python
	import sys
	from dlcli import api

	'''
	Returns a sum of the number of agents that have returned the base.count metric in the last minute.

	You will need to update the TAG, org, account and key variables in the settings below.
	'''