It is generally desirable to group all the hosts for a specific service into a single dashboard view. For example, all the web servers are in single view while all the database servers are in another view.
This is usually not an issue when you are sending custom metrics using Riemann client. However, there are cases where you are using something that you do not control how the metrics are being sent. i.e., Riemann-tools.
Since Riemann-tools scripts are application agnostic, in order for the dashboard view to group hosts, we must inject some application specific information into the tags field. Tags is a collection of arbitrary strings. In the case of Riemann-tools scripts you can pass in arbitrary strings on the command line.
riemann-health --host 127.0.0.1 --tag "prod" --tag "webserver"
In this case, the above health check will include two extra tags "prod" and "webserver". On the dashboard side, you can create a grid view contains the following query:
'(tagged = "prod" and tagged = "webserver")'
tagged keyword in the query will search for all tags of an event. The query above will return only "prod" and "webservers" hosts.
Sometimes it is useful to know how many hosts are sending data to Riemann. This is especially useful in cloud type of environment where nodes are constantly scaling up and down.
Since Riemann server knows all the sending hosts, we will create a new stream that will keep track of unique hosts and index it as a new service so the Riemann dashboard can query it. In riemann.config you should have something that looks like below:
(let [index (default :ttl 300 (update-index (index)))]
(streams
prn
index))
Since stream is just a function that takes an event as an argument, we will just create an anonymous function to existing streams that will do all the work.
(let [hosts (atom #{})]
(fn [event]
(swap! hosts conj (:host event))
(prn :hosts @hosts)
(index {:service "unique hosts"
:time (unix-time)
:metric (count @hosts)})))
On the dashboard you can create a gauge view with the following query:
'(service = "unique hosts")'
If you created the query correctly, you should see something like below:
Assuming you already included special tags that you would like to group your application hosts by. You can use the following code in riemann.config to create a index that will count the number of unique hosts for a given group of tags.
(let [hosts (atom {}) host (atom #{})]
(fn [event]
(let [tag-str (keyword (clojure.string/join "-" (:tags event)))]
(swap! hosts assoc tag-str (conj (tag-str @hosts #{}) (:host event)))
(index {:service (str (name tag-str) "-count")
:time (unix-time)
:metric (count (tag-str @hosts))})
(swap! hosts (atom {})))))
The above code will create a unique service using all your tags with "-count" appended to the end. For example, if you have "webserver" and "prod" tags, the new service that will count unique hosts will be named "webserver-prod-count". In your dashboard you can query it like below:
'(service = "webserver-prod-count")'
If you create a new gauge view with that query, you will get the current count of all your production web servers.
I too am seeing token errors on basic queries copied from documentation. Don't know what I could be doing wrong here.