Holt-Winters smoother/forecaster for anomaly detection
Forecast threads a smooth curve through a noisy timeseries in a way that lets you visualize trends, cycles, and anomalies. It can be used as part of an automatic anomaly detection system for metric timeseries (that is, sequences of timestamped numerical values).
It accomplishes this using a variation of Holt-Winters forecasting -- more generally known as exponential smoothing. Forecast decomposes a noisy signal into level, trend, repetitive "seasonal" effects, and unexplained variation or noise. In this example, a "season" is a day long, and we model repeated day-over-day variation. The result is a smoothed version of the signal which may be used to forecast future values or detect unexpected variation. This approach has been successfully used for network anomaly detection with other monitoring tools. Here we implement the method in pure Juttle, and our ability to combine historic and realtime data in a single query lets us initialize the forecaster from recent data at startup.
Ten Days
In the figures below, we apply forecast to a 10-day timeseries published in Twitter's AnomalyDetection package. The series is 10 days of counts, reported every minute. The Juttle analysis focuses on the three days around Oct 1.
The first timechart shows the data set over ten days. There is regular daily variation, consistent noise about this variation, and an occasional larger departure from the expected daily range.
The second timechart displays a daily "seasonal" curve (blue) fit to the raw series (orange). This is an incremental fit beginning with the earliest point and continually adjusting as each next point is considered (this allows it to be used in a realtime setting as well as this historical analysis). The estimate continually adjusts to changes in the daily pattern, and you can see that it takes a few days' data to settle in on a consistent daily shape estimate. If you were running this smoother as part of a realtime monitoring query, you would issue the query with a timerange that included a few days history prior to realtime. The cyclic component would be estimated immediately using this historic data before rolling into realtime.
The third timechart in the juttle output displays the forecast curve (coral) along with prediction intervals computed from the prediction error. The prediction error itself is displayed along the bottom, while the time derivative of this error is superimposed along the top of the chart.
Detecting Anomalies
When an observation falls outside the prediction interval, as it does on September 29 and 30, it may indicate an anomaly.
Quick blips like the one on September 29 show up well using confidence intervals.
The big September 30 anomaly shows a shortcoming of confidence intervals computed this way -- if an anomaly persists for more than a blip, it can inflate the estimate of the signal's inherent variation, expanding the confidence interval and possibly masking other anomalies until things have quieted down. In a subsequent post we'll show how to automatically adjust the forecast level and confidence intervals to give less weight to "surprising" observations (using Kalman filtering) and thus recover more quickly when such anomalies are encountered.
Kinks in the error curve (bottom of third chart) also indicate sudden changes in the forecasted timeseries. These are captured as the time derivative of the error curve and displayed as "icicles" along the top of the chart.
Depending on the kind of behavior you consider anomalous (rapid rise? rapid fall? sustained change from usual daily value?) you may choose to use error tresholds, error derivative thresholds, or confidence intervals to detect it. Forecast computes these for you to use as best suits your needs.
Tuning Forecast
The parameters for this forecaster (slevel, strend, sdaily) are the classic Holt-Winters level, trend, and seasonal alpha and beta and gamma parameters. They control how quickly the daily curve and additional component adjust to newly arriving point. Their values in this example were selected by hand (though without much fuss and no optimization). In a subsequent post we will show how to use likelihood methods to automatically choose parameter levels using historic data.