prometheus-grafana-troubleshooting-cheat-sheet.md

Cheat-sheet / troubleshooting guide for tracking down issues between Prometheus and Grafana when metrics are not showing up. This guide assumes Prometheus is collecting metrics but Grafana is not displaying them.

1. Verify Prometheus Service Status

Ensure that the Prometheus service is up and running and accessible within the Kubernetes cluster.

kubectl get pods -n <prometheus-namespace> -l app=prometheus
kubectl get svc -n <prometheus-namespace> -l app=prometheus

2. Port-forward Prometheus to Access Locally

Forward Prometheus to your local machine to verify that metrics are being collected.

kubectl port-forward svc/<prometheus-service-name> -n <prometheus-namespace> 9090:9090

Open http://localhost:9090 and check the Targets page (Status > Targets) to verify that all intended targets are listed as UP. If any are down, Prometheus may not be able to scrape metrics from those endpoints, which could affect Grafana’s dashboards.

3. Check Prometheus Data by Running a Query

Run a query to verify that Prometheus has the expected metrics.

http_requests_total

Adjust based on the metric you're troubleshooting. If data is missing, there may be an issue with the metric source.

4. Verify Grafana Data Source Configuration

In the Grafana web interface, go to Configuration > Data Sources to confirm that the Prometheus data source is set up correctly. Key points to verify:

URL: The URL should match the Prometheus service URL in Kubernetes (e.g., http://prometheus-service.<namespace>.svc.cluster.local:9090).
Authentication: If you’ve set up any authentication for Prometheus, ensure that it is correctly configured in Grafana.

5. Port-forward Grafana to Access Locally

Forward Grafana to your local machine to access it directly.

kubectl port-forward svc/<grafana-service-name> -n <grafana-namespace> 3000:3000

Access Grafana at http://localhost:3000. Check that dashboards are loading and are configured to use the correct Prometheus data source.

6. Test Prometheus Data Source in Grafana

Once connected, go to Configuration > Data Sources > Prometheus > Save & Test. If the connection fails, review the URL and port settings.

7. Check Grafana Logs for Errors

If the data source tests successfully, check Grafana’s logs for any errors related to data retrieval:

kubectl logs -l app=grafana -n <grafana-namespace>

Look for errors such as authentication issues, network connectivity errors, or any failed requests to Prometheus.

8. Network Connectivity Check (Optional)

If issues persist, check connectivity between the Grafana pod and Prometheus pod. You can do this by running an interactive shell in the Grafana pod and using curl to test the Prometheus endpoint.

kubectl exec -it <grafana-pod-name> -n <grafana-namespace> -- /bin/sh
curl http://prometheus-service.<prometheus-namespace>.svc.cluster.local:9090

If the connection fails, there may be a network policy or DNS resolution issue blocking access.

9. Check Prometheus and Grafana Configuration in Values.yaml

If you’re using Helm to deploy, verify that both Prometheus and Grafana values are correctly configured for internal connectivity:

In prometheus.values.yaml: ensure service is configured correctly.
In grafana.values.yaml: verify the datasources configuration to ensure it aligns with Prometheus’s settings.

10. Restart Grafana Pod

Sometimes restarting the Grafana pod can resolve temporary connection issues:

kubectl rollout restart deployment <grafana-deployment-name> -n <grafana-namespace>

After these steps, if the metrics are still not appearing, consider checking for any relevant firewall or network policies in place that might restrict traffic between namespaces or services.

cmcconnell1/prometheus-grafana-troubleshooting-cheat-sheet.md