Troubleshooting/Debugging distributed systems is not easy. It can be really challenging if the systems are not thoroughly monitored. Since the begining of this microservice journey, I have been researching the best way we can build our monitoring infrastructure/instrument our code. This is the the solution I am proposing we adopt going forward.
###Prometheus Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. Prometheus works well for recording any purely numeric time series. It fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures. In a world of microservices, its support for multi-dimensional data collection and querying is a particular strength.
We will be using prometheus for all our monitoring needs except in cases where prometheus cannot handle at the moment(eg hystrix metrics). The major reason for choosing prometheus over other monitoring solutions is that it integrates well with kubernetes(our cluster manager) and all kubernetes metrics are already exposed as prometheus metrics.
###Layers of Metrics We are going to be looking at metrics from 2 different angles.
- Infrastructure metrics
- Application metrics
####Infrastructure metrics
All metrics about our infrastructure(kubernetes cluster) is already exposed by kubernetes and it's available on \metrics
endpoint of each node as well as the master. Prometheus can scape this data when needed. Additional cluster wide metrics will be exposed using kube state metrics.
####Application metrics Metrics must be exposed by deployed application(progres, redis, kafka, microservices) for prometheus to scrape. We will be setting this up in the following ways.
#####Kafka We will be using prometheus jmx exporter as a java agent to export kafka metrics. This means that we might have our own custom kafka docker image. Check out docker kafka prometheus for inspiration.
#####Postgres We will be using postgres prometheus exporter to export postgres metrics. Postgres exporter will be a pod running in our cluster. Check out the repo for more information about setting up.
#####Zookeeper Zookeeper prometheus exporter will be used to export prometheus metrics.
#####Redis Redis prometheus exporter will be used to export prometheus metrics.
#####Go Microservices Go GRPC Prometheus will be used to export prometheus metrics for golang services and API gateway. Other metrics can be added using the official prometheus golang client
#####Node Microservices I'm not sure there is currently any plug and play grpc prometheus exporter. We will need to fully instrument our node grpc service ourselves. We will leverage [nodejs prometheus wrapper] (https://github.com/iadvize/nodejs-prometheus-wrapper) and the official nodejs prometheus client for that.