I would recommend the following operational dashboards and metrics for a product built with a mono/microservice architecture:
-
Service Health Dashboard: This dashboard would display the health status of each service in the system. Key metrics might include:
- Availability: The percentage of time each service is up and running.
- Response Time: The average, median, 95th percentile, and 99th percentile response times for each service.
- 2xx Responses: The number or percentage of requests that result in 2xx (success) HTTP status codes.
- 4xx Responses: The number or percentage of requests that result in 4xx (client error) HTTP status codes. These indicate issues like bad requests or unauthorized access.
- 5xx Responses: The number or percentage of requests that result in 5xx (server error) HTTP status codes. These indicate issues with your services.
-
Error Rate: The number or percentage of requests that result in errors. This could be calculated as the sum of 4xx and 5xx responses.