Skip to content

Instantly share code, notes, and snippets.

@menduz
Created March 27, 2025 22:58
Show Gist options
  • Save menduz/036171c6781f195ef225fcb97f48979e to your computer and use it in GitHub Desktop.
Save menduz/036171c6781f195ef225fcb97f48979e to your computer and use it in GitHub Desktop.
recording rules.yaml
groups:
- name: users-prediction
rules:
- record: dcl:online_users:5m
expr: >
sum(avg_over_time(dcl_archipelago_peers_count[5m])) OR on() vector(0)
+ sum(avg_over_time(dcl_ws_rooms_connections{service="ws-room-service"}[5m])) OR on() vector(0)
# Long-term average value for the series
- record: dcl:online_users:5m:avg_over_time_1w
expr: >
avg_over_time(dcl:online_users:5m[1w])
# Long-term standard deviation for the series
- record: dcl:online_users:5m:stddev_over_time_1w
expr: >
stddev_over_time(dcl:online_users:5m[1w])
# In the second iteration, we expand our scope by taking the average of a
# four-hour period for the previous week and comparing it to the current
# week. So, if we’re trying to predict the value of a metric at 8am on a
# Monday morning, instead of using the same five-minute window from one
# week prior, we use the average value for the metric from 6am until 10am
# for the previous morning.
# We use the 166 hours in the query instead of one week because we want to
# use a four-hour period based on the current time of day, so we need the
# offset to be two hours short of a full week.
# https://about.gitlab.com/blog/2019/07/23/anomaly-detection-using-prometheus/
- record: dcl:online_users:5m_prediction
expr: >
quantile(0.5,
label_replace(
avg_over_time(dcl:online_users:5m[4h] offset 166h)
+ dcl:online_users:5m:avg_over_time_1w - dcl:online_users:5m:avg_over_time_1w offset 1w
, "offset", "1w", "", "")
or
label_replace(
avg_over_time(dcl:online_users:5m[4h] offset 334h)
+ dcl:online_users:5m:avg_over_time_1w - dcl:online_users:5m:avg_over_time_1w offset 2w
, "offset", "2w", "", "")
or
label_replace(
avg_over_time(dcl:online_users:5m[4h] offset 502h)
+ dcl:online_users:5m:avg_over_time_1w - dcl:online_users:5m:avg_over_time_1w offset 3w
, "offset", "3w", "", "")
)
without (offset)
# - alert: RequestRateOutsideNormalRange
# expr: >
# abs(
# (
# sum by (service,team) (dcl:online_users:5m) - sum by (service,team) (dcl:online_users:5m_prediction)
# ) / sum by (service,team) (dcl:online_users:5m:stddev_over_time_1w)
# ) > 2
# for: 10m
# labels:
# severity: warning
# annotations:
# summary: Requests for service {{ $labels.service }} are outside of expected operating parameters for 10m
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment