Skip to content

Instantly share code, notes, and snippets.

@pracucci
Last active February 19, 2018 14:35
Show Gist options
  • Save pracucci/4531cb1395d5c6d583564c2a951824bf to your computer and use it in GitHub Desktop.
Save pracucci/4531cb1395d5c6d583564c2a951824bf to your computer and use it in GitHub Desktop.
Monitoring Web Workers with Prometheus

Design

Web workers can use the Prometheus PHP client to track metrics. Such metrics are stored in memory and also copied to a shared memory from which another process in the same SysV IPC namespace can read. This design allows us to run a micro webserver in a sidecar container from which we can export Prometheus metrics collected in the worker's cli app.

IPC namespace: when deployed on K8S, two containers of the same Pod run in the same IPC namespace, while two different pods on the same host run on a different IPC namespace unless explicitly configured to all share the host's IPC running the pod with hostIPC: true.

Why this design

There are three options to export metrics to Prometheus from a cli app:

  1. Run a sidecar webserver exposing the worker's metrics (current solution)
  2. Run a single webserver exposing all workers' metrics
  3. Push the metrics to Push Gateway

Why not the Push Gateway?

The main con of this approach is that the Push Gateway has been designed to proxy metrics at the service-level and not at the single instance/process level. This means that once you push an instance-level metric (ie. same metric pushed from multiple instances of the same worker running in parallel) the metric will stay in the Push Gateway for its entire lifecycle, even if the instance of the worker has been terminated. Push Gateway supports an API to remove a group of metrics by label (ie. by instance) but calling it reliably whenever a worker terminates is non trivial (ie. think when the process crashes or if the request fails during a worker's shutdown procedure).

Another con of the Push Gateway is that it's yet another SPoF in the monitoring pipeline that should be avoided unless strictly required.

Why not running a single webserver exposing all worker's metrics?

This solution is possible thanks to the Redis storage support of the Prometheus PHP client. When configured to store metrics on Redis, all workers can track their own metrics on Redis, and than a single webserver instance can export all workers' metrics reading them from Redis.

Pros:

  • Not the overhead of a webserver for each single worker

Cons:

  • Adds an external dependency to monitoring. When Redis is down we loose web workers monitoring too, that's something shouldn't be coupled together.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment