Monitoring and Alerting Minimum Viable Product

A checklist for those attempting to only get out of bed when it's important and to be able to debug critial and non-critial issues.

Those emphesised should probably get you out of bed when they're too high/low/gone.

This is super opinionated but I welcome feedback. It's biased to retrofitting/cleaning up/brownfield type work because that's what I know best.

HTTP(S) Services

What you're serving, how much of it and how fast

Inbound traffic volume ("requests") (too low)
- Group by function (subdomain/high level URL)
- Group by source
- Ideally both of the above
Outbound traffic volume ("responses")
- Group by function (subdomain/high level URL) if possible
- Group by HTTP status code groups
- - Good traffic: 2xx (too low)
- - Specific traffic 3xx, 4xx
- - Bad traffic 5xx (too high)
Responce times
- Group by function (subdomain/high level URL) if possible
HTTP/HTTPS split ratio
External synthetic user jouney test eg. from ourside of your infrastructure, test your service like a user will use it (too slow, too broken)

Web Servers

Apache

Total connections
Worker statuses
- Idle
- Reading
- Sending
- Waiting

Nginx

Specific metrics
That should be monitored
When using
Nginx

Load Balancers

Whats coming in and how well are we handing it off. You might also monitor your over all HTTP(S) Services metrics from your load balancer, but in addition to those...

Status of backend pool members
Response rates from pool members
Traffic levels to each pool member

HAProxy

Specific metrics
That should be monitored
When using
HAPProxy

Nginx

Specific metrics
That should be monitored
When using
Nginx as a load balancer

Databases

Query volume
Query responce times
- Grouped by query pattern
Top queries
- By frequency
- By returned volume
- By execution length (slow queries)
Replication statistics
Table Statistics
- Read volumes
- Write volumes

coldclimate/monitoring-alerting-mvp.markdown

Monitoring and Alerting Minimum Viable Product

HTTP(S) Services

Web Servers

Apache

Nginx

Load Balancers

HAProxy

Nginx

Databases

MySQL

Queues

RabbitMQ