title | tags |
---|---|
Time Series Analysis |
Time Series |
A normal machine learning dataset is a collection of observations.
For example:
observation #1
observation #2
observation #3
Time does play a role in normal machine learning datasets.
Predictions are made for new data when the actual outcome may not be known until some future date. The future is being predicted, but all prior observations are almost always treated equally.
A time series dataset is different.
Time series adds an explicit order dependence between observations: a time dimension.
This additional dimension is both a constraint and a structure that provides a source of additional information.
Level: When you read about the “level” or the “level index” of time series data, it’s referring to the mean of the series.
Noise/Randomness/Irregular movements: All time series data will have noise or randomness in the data points that aren’t correlated with any explained trends. Noise is unsystematic and is short term.
Seasonality: If there are regular and predictable fluctuations in the series that are correlated with the calendar – could be quarterly, weekly, or even days of the week, then the series includes a seasonality component. It’s important to note that seasonality is domain specific, for example real estate sales are usually higher in the summer months versus the winter months while regular retail usually peaks during the end of the year. Also, not all time series have a seasonal component, as mentioned for audio or video data.
Trend: When referring to the “trend” in time series data, it means that the data has a long term trajectory which can either be trending in the positive or negative direction. An example of a trend would be a long term increase in a companies sales data or network usage.
Cycle: Repeating periods that are not related to the calendar. This includes business cycles such as economic downturns or expansions or salmon run cycles, or even audio files which have cycles, but aren’t related to the calendar in the weekly, monthly, or yearly sense.
We have different goals depending on whether we are interested in understanding a dataset or making predictions.
Understanding a dataset, called time series analysis, can help to make better predictions, but is not required and can result in a large technical investment in time and expertise not directly aligned with the desired outcome, which is forecasting the future.
When using classical statistics, the primary concern is the analysis of time series.
Time series analysis involves developing models that best capture or describe an observed time series in order to understand the underlying causes. This field of study seeks the “why” behind a time series dataset.
This often involves making assumptions about the form of the data and decomposing the time series into constitution components.
The quality of a descriptive model is determined by how well it describes all available data and the interpretation it provides to better inform the problem domain.
Making predictions about the future is called extrapolation in the classical statistical handling of time series data.
More modern fields focus on the topic and refer to it as time series forecasting.
Forecasting involves taking models fit on historical data and using them to predict future observations.
Descriptive models can borrow for the future (i.e. to smooth or remove noise), they only seek to best describe the data.
An important distinction in forecasting is that the future is completely unavailable and must only be estimated from what has already happened.
The skill of a time series forecasting model is determined by its performance at predicting the future. This is often at the expense of being able to explain why a specific prediction was made, confidence intervals and even better understanding the underlying causes behind the problem.
When a series contains a trend, seasonality, and noise, then you can define that series by the way those components interact with each other. These interactions can be reduced to what is called either a multiplicative or additive time series.
A multiplicative time series is when the fluctuations in the time series increase over time and is dependent on the level of the series.
Time series = t (trend) * s (seasonality) * n (noise)
Therefore, the seasonality of the model would increase with the level over time. In the graph below, you can see that the seasonality of airplane passengers increases as the level increases:
An additive model is when the fluctuations in the time series stay constant over time.
Time series = t (trend) + s (seasonality) + n (noise)
So an additive model’s seasonality should be constant from year to year and not related to the increase or decrease in the level over time
Knowing whether your series data is multiplicative or additive is important if you want to decompose your data into its parts such as trend, or seasonality. Sometimes it’s enough to graph your data to discover if it’s an additive or multiplicative series. But, when it’s not, you can decompose your data and compare the ACF values to discover the correlation between data points when testing both an additive model and a multiplicative model.
Decomposition is the deconstruction of the series data into its various components: trend, cycle, noise, and seasonality when those exist. Two different types of classic decomposition include multiplicative and additive decomposition.
The purpose of decomposition is to isolate the various components so you can view them each individually and perform analysis or forecasting without the influence of noise or seasonality. For example, if you wanted to only view the trend of a real estate series, you would need to remove the seasonality found in the data, the noise due to randomness, and any cycles such as economic expansion. Below is a figure showing the various components of the mortgage time series including the original data, the seasonality, trend, and noise or “remainder”:
Before actually forecasting, let’s understand how to measure the quality of predictions and have a look at the most common and widely used metrics
- R squared, coefficient of determination (in econometrics it can be interpreted as a percentage of variance explained by the model), (-inf, 1] sklearn.metrics.r2_score
- Mean Absolute Error, it is an interpretable metric because it has the same unit of measurement as the initial series [0, +inf)
sklearn.metrics.mean_absolute_error
- Median Absolute Error, again an interpretable metric, particularly interesting because it is robust to outliers, [0, +inf)
sklearn.metrics.median_absolute_error
- Mean Squared Error, most commonly used, gives higher penalty to big mistakes and vise versa, [0, +inf)
sklearn.metrics.mean_squared_error
- Mean Squared Logarithmic Error, practically the same as MSE but we initially take logarithm of the series, as a result we give attention to small mistakes as well, usually is used when data has exponential trends, [0, +inf)
sklearn.metrics.mean_squared_log_error
- Mean Absolute Percentage Error, same as MAE but percentage, — very convenient when you want to explain the quality of the model to your management, [0, +inf), not implemented in sklearn
Forecasting is one of the most relevant tasks when working with time series data, but it’s hard to know where to get started. Although you can forecast with Simple Moving Average (SMA) or (EMA), another moving average model called Autoregressive Integrated Moving Average (ARIMA) is popular for fairly accurate and quick forecasting of time series.
Simple Moving Average + 1st,2nd,3rd Exponential Smoothing (With mathematical formula): Medium
Simple models
- Moving Average (MA)
- Auto Regression (AR)
Regressive models - Requires Stationary Data
- Autoregressive Moving Average
- Autoregressive Integrated Moving Average
- Seasonal Autoregressive Integrated Moving Average
Smoothing models - Can work with Non-stationary Data
- Simple Exponential Smoothing
- Double Exponential Smoothing
- Triple Exponential Smoothing
- ACF is the correlation of the time series with itself in the previous point of time - Previous lag.
- PACF is the correlation of the time series with itself in the previous point of time without the interventing correlation of any lags in between
- ACF is used for Moving Average models - Choosing the p parameter.
- PACF is used for Auto Regression models - Choosing the q parameter.
- Based on ACF plot, one can determine whether the data has seasonality or not.
Time Series Nested Cross Validation
Anomaly detection is a technique used to find abnormal behavior or data points in a series. The anomalies could range from spikes or dips that largely deviate from normal patterns, or can be larger and longer term abnormal trends.
For example, the presence of data points that largely deviate from the central measures of tendency could represent the occurrence of credit card fraud because the data point or pattern is abnormal for a particular customer (check out the Fast Anomaly Detection algorithm). Other times, patterns are being analyzed for instance in the case of iOT data analyzing blood pressure, or other sensitive data where health anomalies would want to be detected.
Different types of anomaly detection include statistical methods, unsupervised learning, and supervised learning. You can also use a method using the ARIMA model that is described in Chen and Leu’s 1993 paper and is available in most statistical packages.
There is almost an endless supply of time series forecasting problems.
Below are 10 examples from a range of industries to make the notions of time series analysis and forecasting more concrete.
- Forecasting the corn yield in tons by state each year.
- Forecasting whether an EEG trace in seconds indicates a patient is having a seizure or not.
- Forecasting the closing price of a stock each day.
- Forecasting the birth rate at all hospitals in a city each year.
- Forecasting product sales in units sold each day for a store.
- Forecasting the number of passengers through a train station each day.
- Forecasting unemployment for a state each quarter.
- Forecasting utilization demand on a server each hour.
- Forecasting the size of the rabbit population in a state each breeding season.
- Forecasting the average price of gasoline in a city each day.
- I expect that you will be able to relate one or more of these examples to your own time series forecasting problems that you would like to address.
Why make a non-stationary sereis to stationary before Forecasting?
- Forecasting a stationary series is relatively easy and more reliable
- Autoregressive forecasting model are Linear Regression model that utilize the lags of the series itself as predictors.
- Datacamp - Time Series Analysis
- An end-to-end time series
- Comprehensive guide for Time Series Forecast
- A detailed explaination - Making use of ACF and PACF
- A detailed explaination - Making use of ACF and PACF - part 2
- Introduction to Time Series
- MachineLearningMastery - What is time series forecasting?
- MachineLearningMastery - Gentle Intro to ExpSmoothing in Python
- Time Series - Math+Python Interpretation
- Heavily Recommended: Time Series Forecasting in Python - Very comprehensive
- 11 Classical Forecasting Method Cheatsheet
- PennState College - STAT510
- Interesting: How NOT to use ML for time series forecasting