In the scenario that we have multiple Ray clusters with processes running on the same node, we want to make sure that each cluster's dashboard only contains metrics from the processes in its cluster. Currently, the reporter
process responsible for collecting these metrics (of which there is one per unique (cluster, node) tuple) fetches metrics for all Ray workers on the node, regardless of their cluster.
I assume that we wish to still have a discrete reporter
process for each (cluster, node) pair, rather than switching to have a single reporter
process per node. It is easier to implement given our current process handling, and it allows us to perform per-cluster configuration of reporting which, although we do not use it now, I think we should aim to support. The downside is that there are certainly metrics that we collect that don't differ at a node level, such as CPU utilization, that would have N processes monitoring them rather than 1.
Of the two solutions, I th