Observability - K8s pre Meta

Observability of system is generally composed from these three areas:

Metrics

Mostly time series data used for measuring of the performance. This applies both for the infrastructure and application.

Not requested

Event logs

Gathering event logs and its analyses chronologically.

M0 - Engineers use direct access

Engineers use direct access to the K8s cluster to fetch the log outputs from the PODs.

K8s cluster impact:

no hw additional requirements needed

M0-1 - Admin access

Just provide the developer a credentials to access the cluster and a guideline on how to fetch the logs.

Estimate:

0.5h - write the guideline

M0-2 Role based controll access

Provide the developers an K8s user account with read-only permission

Estimate:

2h - create helm chart to easy creation of developer accounts
0.5h - write user guideline (how to fetch creds, how to work with chart)

M1 - ELK stack

Use the ELK stack https://www.elastic.co/what-is/elk-stack

K8s cluster impact:

additional HW requirements needed

M1-0 - Basic installation and configuration (Test environment)

Covers:

just simple installation of ELK chart into the cluster with free beats
installation guideline on how to reproduce.
use of default admin account / user account managed manually
basic configuration in helm chart repo
no backups
expose under request path (setup ingress) /kibana

Estimate:

5-8h

M1-2 - GitOps (Production environment)

Covers:

M1-0
user management managed from git
alert management from git
ci/cd pipeline to reflect gitops stuff
backups
??? (depends on what resources you need to manage from git, that is derived from the features you want to use)

Estimate:

M1-0
10h ++

M2 - Using Azure watch

Use azure cloud https://azure.microsoft.com/en-us/services/monitor/#overview

Usually fulfills all observability needs.

Not requested though from longterm it may save costs since:

it's cloud native, so you do not host anything.
user management within the azure

Traces

A trace provides an overview of the request flow withing the system.

Not requested

pgressa/meta-observability.md

Metrics

Event logs

M0 - Engineers use direct access

M0-1 - Admin access

M0-2 Role based controll access

M1 - ELK stack

M1-0 - Basic installation and configuration (Test environment)

M1-2 - GitOps (Production environment)

M2 - Using Azure watch

Traces