- We can implement something like https://github.com/etsy/statsd. This will collect and aggregate metrics.
- At pre-defined intervals (say... every 10 seconds), these metrics are flushed to a backend store
- We could use SQL Server to store the metrics:
#SQL Schema
- Each series is a seperate table containing a timestamp epoch (bigint) and a value (float)
- Rows are inserted, timestamp is the PK
#Example Query
This will query data from two series, between two times, aggregated to the hour
WITH data AS (
SELECT * FROM Series1
UNION
SELECT * FROM Series2
)
SELECT
(Timestamp - (Timestamp % 3600)),
COUNT(*),
AVG(Value)
FROM data
WHERE Timestamp >= 100000 AND Timestamp <= 200000
GROUP BY (Timestamp - (Timestamp % 3600))
ORDER BY (Timestamp - (Timestamp % 3600))
Here is a neat query that joins two series, and then does some aggregations based on calculations between the series. Basically, filter each series and then FULL OUTER JOIN on timestamp columns. Then aggregate.
; WITH
_series1 AS (SELECT (Timestamp - (Timestamp % 3600)) AS [Hour], * FROM Series1 WHERE Timestamp >= 100000 AND Timestamp <= 200000)
,_series2 AS (SELECT (Timestamp - (Timestamp % 3600)) AS [Hour], * FROM Series2 WHERE Timestamp >= 100000 AND Timestamp <= 200000)
SELECT
COALESCE(_series1.[Hour], _series2.[Hour])
,COUNT(*)
,MIN(_series1.Value - _series2.Value)
FROM _series1 FULL OUTER JOIN _series2 ON _series1.Timestamp = _series2.Timestamp
GROUP BY COALESCE(_series1.[Hour], _series2.[Hour])