Observability of system is generally composed from these three areas:
Mostly time series data used for measuring of the performance. This applies both for the infrastructure and application.
Not requested
Gathering event logs and its analyses chronologically.
Engineers use direct access to the K8s cluster to fetch the log outputs from the PODs.
K8s cluster impact:
- no hw additional requirements needed
Just provide the developer a credentials to access the cluster and a guideline on how to fetch the logs.
Estimate:
- 0.5h - write the guideline
Provide the developers an K8s user account with read-only permission
Estimate:
- 2h - create helm chart to easy creation of developer accounts
- 0.5h - write user guideline (how to fetch creds, how to work with chart)
Use the ELK stack https://www.elastic.co/what-is/elk-stack
K8s cluster impact:
- additional HW requirements needed
Covers:
- just simple installation of ELK chart into the cluster with free beats
- installation guideline on how to reproduce.
- use of default admin account / user account managed manually
- basic configuration in helm chart repo
- no backups
- expose under request path (setup ingress) /kibana
Estimate:
- 5-8h
Covers:
- M1-0
- user management managed from git
- alert management from git
- ci/cd pipeline to reflect gitops stuff
- backups
- ??? (depends on what resources you need to manage from git, that is derived from the features you want to use)
Estimate:
- M1-0
- 10h ++
Use azure cloud https://azure.microsoft.com/en-us/services/monitor/#overview
Usually fulfills all observability needs.
Not requested though from longterm it may save costs since:
- it's cloud native, so you do not host anything.
- user management within the azure
A trace provides an overview of the request flow withing the system.
Not requested