Skip to content

Instantly share code, notes, and snippets.

@welch
Last active April 27, 2016 20:44
Show Gist options
  • Save welch/a732270c4bfa423587ae to your computer and use it in GitHub Desktop.
Save welch/a732270c4bfa423587ae to your computer and use it in GitHub Desktop.
exponential smoothing with de-seasonalization

Forecast

Holt-Winters smoother/forecaster for anomaly detection

Forecast threads a smooth curve through a noisy timeseries in a way that lets you visualize trends, cycles, and anomalies. It can be used as part of an automatic anomaly detection system for metric timeseries (that is, sequences of timestamped numerical values).

It accomplishes this using a variation of Holt-Winters forecasting -- more generally known as exponential smoothing. Forecast decomposes a noisy signal into level, trend, repetitive "seasonal" effects, and unexplained variation or noise. In this example, a "season" is a day long, and we model repeated day-over-day variation. The result is a smoothed version of the signal which may be used to forecast future values or detect unexpected variation. This approach has been successfully used for network anomaly detection with other monitoring tools. Here we implement the method in pure Juttle, and our ability to combine historic and realtime data in a single query lets us initialize the forecaster from recent data at startup.

Ten Days

In the figures below, we apply forecast to a 10-day timeseries published in Twitter's AnomalyDetection package. The series is 10 days of counts, reported every minute. The Juttle analysis focuses on the three days around Oct 1.

The first timechart shows the data set over ten days. There is regular daily variation, consistent noise about this variation, and an occasional larger departure from the expected daily range.

The second timechart displays a daily "seasonal" curve (blue) fit to the raw series (orange). This is an incremental fit beginning with the earliest point and continually adjusting as each next point is considered (this allows it to be used in a realtime setting as well as this historical analysis). The estimate continually adjusts to changes in the daily pattern, and you can see that it takes a few days' data to settle in on a consistent daily shape estimate. If you were running this smoother as part of a realtime monitoring query, you would issue the query with a timerange that included a few days history prior to realtime. The cyclic component would be estimated immediately using this historic data before rolling into realtime.

The third timechart in the juttle output displays the forecast curve (coral) along with prediction intervals computed from the prediction error. The prediction error itself is displayed along the bottom, while the time derivative of this error is superimposed along the top of the chart.

Detecting Anomalies

When an observation falls outside the prediction interval, as it does on September 29 and 30, it may indicate an anomaly.

Quick blips like the one on September 29 show up well using confidence intervals.

The big September 30 anomaly shows a shortcoming of confidence intervals computed this way -- if an anomaly persists for more than a blip, it can inflate the estimate of the signal's inherent variation, expanding the confidence interval and possibly masking other anomalies until things have quieted down. In a subsequent post we'll show how to automatically adjust the forecast level and confidence intervals to give less weight to "surprising" observations (using Kalman filtering) and thus recover more quickly when such anomalies are encountered.

Kinks in the error curve (bottom of third chart) also indicate sudden changes in the forecasted timeseries. These are captured as the time derivative of the error curve and displayed as "icicles" along the top of the chart.

Depending on the kind of behavior you consider anomalous (rapid rise? rapid fall? sustained change from usual daily value?) you may choose to use error tresholds, error derivative thresholds, or confidence intervals to detect it. Forecast computes these for you to use as best suits your needs.

Tuning Forecast

The parameters for this forecaster (slevel, strend, sdaily) are the classic Holt-Winters level, trend, and seasonal alpha and beta and gamma parameters. They control how quickly the daily curve and additional component adjust to newly arriving point. Their values in this example were selected by hand (though without much fuss and no optimization). In a subsequent post we will show how to use likelihood methods to automatically choose parameter levels using historic data.

// Exponential smoother/forecaster with de-seasonalization, based on Holt-Winters.
// This models an unobserved level and trend component in a noisy signal, along with
// optional "seasonal" effects at the level of day-over-day variation.
// The result is a smoothed version of the signal which may be used to forecast
// future values or detect unexpected variation.
//
// There are two useful pieces here:
// 1. a reducer named forecast, the holt-winters forecaster, called like this:
// ... | put state = forecast(Y, slevel, strend, sdaily) | ...
//
// 2. a proc named forecastPI (which calls forecast for you and emits some useful
// points based on its result), called like this:
// ... | forecastPI(Y, z, alpha, slevel, strend, sdaily) | ...
//
// See comments above each for detail on parameters and output
//
import "interpolate.juttle" as interp;
// forecast: compute holt-winters forecast of a field as it varies over time.
//
// parameters:
//
// Y is the point field to smooth.
// slevel, strend, sdaily are smoothing factors, numbers ranging [0..1].
// They determine how quickly the smoother adjusts to new values as they arrive.
// Setting a factor to 0 freezes the feature at its initial estimate,
// while setting a factor to 1 causes that feature to depend solely
// on the most recent point. Setting strend or sdaily to null
// removes that feature from the model.
//
// returns:
//
// forecast returns the forecast of the current value (using state prior to the
// current value) and its internal state estimates, as an array:
// [value, level, trend, daily]
//
// forecast assumes arriving points are equally spaced in time.
// level is the estimate of the deseasonalized series value, trend the estimate of the
// deseasonalized series slope. daily is the estimated seasonal factor
// to be used at the next occurrence of the current position in the cycle.
//
// This implementation differs from additive Holt-Winters in its handling of
// seasonality. It independently fits a C1 piecewise cubic curve to
// day-over-day values, and maintains this via gradient descent on prediction error
// rather than as part of a recursive Winters update (the small changes to make this
// a Winters-style update are commented in the code below).
//
export reducer forecast(Y, slevel, strend, sdaily) {
var level, trend, day, daily, time, yhat;
var initial = true, n_daily=6;
function predict() {
return level + trend + day;
}
function update() {
var err, Wday, i_daily, W, du;
if (initial) {
yhat = *Y ;
if (sdaily == null) {
day = 0;
} else {
day = *Y * sdaily / (slevel + sdaily);
daily = [day,day,day,day,day,day];
}
level = *Y - day;
trend = 0;
time = *"time";
initial = false;
} else {
// error-correction form of Holt-Winters, eg, https://www.otexts.org/fpp
// du = 1 ; // Winters-style: don't scale by interval
du = (*"time" - time) / :hour:;
time = *"time";
if (sdaily != null) {
// get the previous daily factors for this time
Wday = interp.project_cubic(*"time", daily, n_daily, :day:);
day = Wday[0];
i_daily = Wday[1];
W = Wday[2];
}
yhat = predict() ; // 1-step forecast from previous state as reducer return value
err = *Y - yhat;
if (sdaily != null) {
// update day and project the change back onto the daily array
// var derr = err ; // Winters-style: recursive update sees level and trend error
var derr = (*Y - day);
var dday = sdaily * derr * du;
day = day + dday;
daily[i_daily ] = daily[i_daily] + dday * W[0];
daily[(i_daily+1) % n_daily] = daily[(i_daily+1) % n_daily] + dday * W[1];
daily[(i_daily+2) % n_daily] = daily[(i_daily+2) % n_daily] + dday * W[2];
daily[(i_daily+3) % n_daily] = daily[(i_daily+3) % n_daily] + dday * W[3];
}
if (strend != null) {
trend = trend + slevel * strend * err * du;
}
level = level + trend + slevel * err * du;
}
}
function result() {
return [yhat, level, trend, day ];
}
function reset() {
initial = true;
}
}
// derivative (d/dt) of a timeseries, approximated
// as a backwards difference of Y vs the time field.
reducer ddt(Y, units) {
var y, time, dy, dt;
var initial = true;
function update() {
if (initial) {
dy = 0;
y = *Y;
dt = :1s:;
time = *"time";
initial = false;
} else {
dy = *Y - y ;
y = *Y;
dt = *"time" - time;
time = *"time";
}
}
function result() {
return dy * (units / dt);
}
function reset() {
initial = true;
}
}
// Compute a forecast along with prediction intervals based
// on a smoothed stderr estimate.
//
// parameters:
//
// Y: field to forecast.
// z: number of stdandard deviations from mean to include in the prediction
// interval (z = 1.96 is the 97.5 percentile and thus a 95% confidence bound,
// if errors were idependent and normally distributed (often untrue)).
// alpha: stderr smoothing alpha (0..1). How quickly your error envelope recovers from an anomaly
// slevel, strend, sdaily: smoothing factors for Holt, passed to the smooth() reducer
//
// result:
//
// For a field name Y, this populates the current point with:
// Y_pred: predicted next y value
// Y_level: de-seasonalized prediction
// Y_trend: predicted next slope
// Y_day: predicted next daily seasonal factor
// Y_err: estimated stderr (RMS prediction error over time)
// Y_derr: time derivative of stderr
// Y_hi, Y_low: prediction interval bounds
//
// For simple anomaly detection you want to test if Y_low <= Y <= Y_hi and alert if not
// Or test Y_derr against a fixed threshold (derr is adaptive to current variation, so
// a fixed derr threshold is really an adaptive alert threshold). the alpha parameter
// adjusts the sensitivity of Y_derr.
//
export sub forecastPI(Y, z, alpha, slevel, strend, sdaily) {
put state = forecast(Y, slevel, strend, sdaily)
| put err2 = (*Y - state[0]) * (*Y - state[0])
| put stderr = Math.sqrt(smooth(err2, alpha, null, null)[0])
| put *(Y+"_pred") = state[0]
| put *(Y+"_level") = state[1]
| put *(Y+"_trend") = state[2]
| put *(Y+"_day") = state[3]
| put *(Y+"_err") = stderr
| put *(Y+"_derr") = ddt(stderr, :hour:)
| put *(Y+"_hi") = state[0] + z * stderr
| put *(Y+"_low") = state[0] - z * stderr
| remove state, err2, stderr
}
// Data generation utilities for demos:
sub hourly_cpu(from, to) {
demo cdn metrics 'cpu' -from from -to to -every :m:
-nhosts 4 -dos .5 -dos_dur :5m: -ripple .3 -cpu_alpha 0.8
| filter host ~ 'sjc*'
| put cpu=value
| remove type, host, pop, value
}
sub daily_cpu(from, to, every) {
demo cdn metrics 'cpu' -from from -to to -every every
-nhosts 4 -dos 0.3 -dos_dur :30m: -cpu_alpha 0.8 -daily 4
| filter host ~ 'sjc*'
| put cpu=value
| remove type, host, pop, value
}
sub bumpy_cpu(from, to) {
demo cdn metrics 'cpu' -from from -to to -every :5m:
-nhosts 4 -dos .3 -dos_dur :15s: -daily 3
| filter host ~ 'sjc*'
| put cpu=value
| remove type, host, pop, value
}
// Demo: forecasting for simulated CPU
//
// This demo performs a smoothing forecast including a continuous
// model of repeated day-to-day varition. It displays timeseries of
// the original input, the predicted/smoothed input, a 95% confidence
// interval around this prediction, and the stderr. It also displays
// the prediction components: level, trend and daily variation.
// The seasonal component needs to see a few days' data to settle in.
//
daily_cpu -from :2014-01-01: -to :2014-01-08: -every :10m:
| forecastPI -Y "cpu" -z 1.96 -alpha .5 -slevel .1 -strend null -sdaily .2
|( keep cpu, cpu_pred, cpu_level, cpu_trend, cpu_day, cpu_err, cpu_derr, time
| split cpu, cpu_pred, cpu_level, cpu_trend, cpu_day, cpu_err, cpu_derr
| @timechart -display.duration :3d:
; keep cpu, cpu_pred, cpu_low, cpu_hi, time
| split cpu, cpu_pred, cpu_low, cpu_hi
| @timechart -display.duration :3d:
)
// interpolation functions.
//
// project_time: project time onto a cycle of intervals of duration period/n.
// Return [idx, u] where 0 <= idx < n is the periodic interval this
// time falls into, and 0 <= u < 1 is the coordinate within the
// unit interval
//
export function project_time(time, n, period) {
var interval = period / n;
var t0 = Date.quantize(time, interval);
var u = (time - t0) / interval;
var idx = Math.floor((t0 - Date.new(0)) / interval) % n;
return [idx, u];
}
//
// linear: piecewise-linear interpolation (periodic)
// interpolate values to get a continuous periodic function of time.
// A is an array of samples representing an overall cycle of duration period
// with samples uniformly spaced and s.t. A[0] should be the value at t==0.
//
export function linear(time, A, len, period) {
var IDX = project_time(time, len, period);
var idx=IDX[0], u=IDX[1];
return (1 - u) * A[idx] + u * A[ (idx+1) % len ];
}
//
// catrom_weights: weighting for successive points for a given 0 <= u < 1
//
function catrom_weights(u) {
var u2 = u*u;
var u3 = u2 * u;
var W = [-u/2 + u2 - u3/2,
1 - 2.5 * u2 + 1.5 * u3,
u / 2 + 2 * u2 - 1.5 * u3,
-u2 / 2 + u3 / 2];
return W;
}
//
// project_cubic: cubic spline interpolation along with segment index and weights
// Return an array [val, idx, W], where
// val == W[0]*A[idx] + W[1]*A[idx+1] + W[2]*A[idx+2] + W[3]*A[idx+3], indexed modulo len
//
export function project_cubic(time, A, len, period) {
var IDX = project_time(time, len, period);
var idx=(IDX[0] + len - 1) % len, u=IDX[1];
var W = catrom_weights(u);
var val = W[0]*A[idx] + W[1]*A[(idx+1) % len] + W[2]*A[(idx+2) % len] + W[3]*A[(idx+3) % len];
return [val, idx, W];
}
//
// cubic: cubic spline interpolation (periodic)
// interpolate values to get a smooth periodic function of time.
// A is an array of samples representing an overall cycle of duration period
// with samples uniformly spaced and s.t. A[0] should be the value at t==0.
//
export function cubic(time, A, len, period) {
return project_cubic(time, A, len, period)[0];
}
//
// demo: draw two periods of the cycle, interpolating linearly and cubically
//
const values = [10, 5, 20, 15, 30];
const N = 5;
emitter -start 0 -limit 60
| put pwl = linear(time, values, N, :30s:), curve = cubic(time, values, N, :30s:)
| split pwl, curve
| @timechart
// Forecasting and anomaly example using the Twitter anomaly timeseries from
// https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series
// (this is a different computational approach, applied to the test data supplied with their R package)
//
import "forecast.juttle" as FC;
source "https://gist.githubusercontent.com/welch/a34b41e57db2075b5c4a/raw/f215975aa432cc5ea254769f3ebef77806d2aa44/twitter-anomaly.json"
| FC.forecastPI -Y "count" -z 2 -alpha .5 -slevel .1 -strend null -sdaily .2
| split count, count_pred, count_hi, count_low, count_level, count_trend, count_day, count_err, count_derr
// | ( merge ; filter name ~~ /low|hi/ | put name = "ci" ) // gradient fill
// there are samples every minute. thin out for display.
| ( reduce -every :30m: value = max(value) by name
; reduce -every :30m: -on :15m: value = min(value) by name )
| put value = (name == "count_derr") ? 300 - value/2 : value // icicle display
|( filter name = "count" | @timechart -display.dataDensity 0
; filter name = "count" || name = "count_day" | @timechart -display.dataDensity 0
; filter name = "count" || name ~~ /pred|ci|low|hi|err|derr/
| filter time < :1980-10-02: | @timechart -o {
display: {
duration: :3d:,
dataDensity: 0
},
title: "Prediction intervals and error variation"
keyField: 'name',
series: [
//{ label: "95% confidence interval", name : "ci", color: "#6A6A6A"}, // gradient fill
{ label: "lower 95% PI", name : "count_low", color: "#6A6A6A"},
{ label: "upper 95% PI", name : "count_hi", color: "#6A6A6A"},
{ label: "RMS forecast error", name : "count_err", color: "Gainsboro"},
{ label: "forecast error rate", name : "count_derr", color: "gray"},
{ name : "count", color: "#BBC0C3"},
{ label: "forecast", name : "count_pred", color: "coral"},
]
}
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment