prometheus-config.md

That's a good question. It mostly comes down to how many individual metrics and how many samples per second you plan to ingest. The number of actual targets isn't as big an issue as the scrapes are cheap, a simple http GET, but the sample ingestion takes some work.

RAM is a big factor

It limits how much data you can crunch with queries
It limits how much data can be buffered before writing to the disk storage

Network throughput is not a huge issue. A single server with millons of timeseries and 100k samples/second only needs a few megabits/second.

CPU is important, a large server can easily use many cores.

For example, a prometheus server configured to monitor just node_exporter metrics:

~1700 nodes
~1400 metrics/node
~2.3M in-memory series
~78k samples/second

This server uses about 45GB of ram, and typically uses about 5 CPUs

It also needs about 5GB/day of storage space (SSD in this case) with varbit encoding.

We could probably get away with a lot less ram, but it allows for very large historical queries.

ansrivas/prometheus-config.md