ljbelenky · June 8, 2021 16:47
diff --git a/Time Series, Part 1.ipynb b/Time Series, Part 1.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Time Series Analysis\n",
    "\n",
    "## Part 1 - Why Time Series is Weird and Hard\n",
    "\n",
    "## Introduction\n",
    "\n",
    "Time series analysis may look simple, but it is deceptively difficult. It seems like everyone who has written about it has included a warning that it's as much an art as a science. There is no sure path to success,  that some of the common techniques require a fair amount of guesswork and that some of the common tools require interpretation. Before we delve into the techniques of time series analysis, let's spend some time understanding how it is different from traditional ML and why this requires different approaches.\n",
    "\n",
    "Time-series analysis is the practice of predicting future values, based on a history of previous values. A typical time-series question might be, \"Given the history of a stock price, can you tell me what it will be tomorrow, next week, or next month?\"\n",
    "\n",
    "Time series prediction is part of the larger topic of supervised machine learning. Superficially, it appears to be similar to other supervised learning techniques. We have a set of known exogenous variables ($X$), a set of known targets ($\\vec y$) and we are looking for any algorithm that faithfully maps $X$ to $\\vec y$.\n",
    "\n",
    "The Data Science student, having learnined the \"classical\" ML techniques (linear regression, random forest, k-NN, etc) and having observed that they all give reasonable results to \"typical\" ML problems, might be tempted to approach time-series prediction using the same techniques. However they will find that these often give terribly unhelpful results.\n",
    "\n",
    "So, before we delve into the topic of how to approach time-series prediction, I thought it would be helpful to spend some time discussing how time-series is different than classical ML, and why it requires a variety of different algorithms."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Similarities to Classical ML\n",
    "\n",
    "First, let's defined what I call a \"classical\" supervised ML problem and discuss how time-series may appear to be similar.\n",
    "\n",
    "An example I often use when teaching ML is a small data set of houses for which we know the number of bedrooms, bathrooms and square-footage, along with the sales price of each house.\n",
    "\n",
    "![](images/houses.png)\n",
    "\n",
    "In this data set, our $X$ matrix is a 2-D array of the predictors (bedrooms, bathrooms, sqft) and our $\\vec y$ is the 1-D vector of prices. \n",
    "\n",
    "The astute observer will notice that there is a strong correlation between the predictors (i.e, there are usually about the same number of bedrooms and bathrooms, and square footage generally trends up as these increase). So, there is some reason to think that recasting this data into an new feature space (using PCA) would simplify and improve the model. However, even without this step, reasonable results can be obtained using even very simple (unregularized, un-tuned models).\n",
    "\n",
    "In contrast, a simple time-series problem often has just a single feature (usually as date-time), and the target (in this case, let's say a stock price). We can cast this data into the shape of a $X$-$\\vec y$ shape by enginnering new features of the prices from the previous several days.\n",
    "\n",
    "![](images/ts1.png)\n",
    "\n",
    "This gets the problem into a shape we are familiar with, but often will not lead to good results. Let's idenftify some of the ways these two problems are different.\n",
    "\n",
    "## Working at the Edge of the Data\n",
    "\n",
    "In our house price predictor, we may have a training data set that contains a large number of observations of houses with 2-4 bedrooms and a similar number of bathrooms. Most of the predictions we will be asked to make will fall somewhere in that range. Even if we are occasionally asked to predicted on a house with 8 or 9 bedrooms, we can expect that we will see more of those 2-4 bedroom houses quite soon.\n",
    "\n",
    "In contrast, when doing time series analysis, we are usually interested in predicting values beyond the range of our existing data set, that is to say, in the future. Predictions from the period where we have training data are usually not helpful. A model that tells us to by stock just before a massive run-up is not helpful if the prediction comes after the opportunity has passed, no matter how accurate the \"prediction\" is.\n",
    "\n",
    "Some methods, such as KNN and Tree-Based Models flatline beyond the range of available data. Any prediction of the future would just be the last value repeated. Linear models will tend to drift off monotonically. \n",
    "\n",
    "To effectively work with time-series data, we need to use algorithms that extrapolate, rather just interpolate.\n",
    "\n",
    "\n",
    "## Response to Black Swan Events\n",
    "\n",
    "In our housing example, if we are tasked to make a prediction for a 1-bedroom, 12-bathroom house, we are likely to think that it's a very oddly designed house, and we will notice that its distribution of values lies well outside the normal population of houses that our model is built on and effective at predicting. So, there's a chance that our prediction won't be very good for that one house, but if the rest of the houses we are asked to predict on reverts to our normal pattern of 2-5 bedroom houses, the damage, that is to say, the one bad prediction, is contained.  We can say that this unusual house is an outlier, but not a black swan.\n",
    "\n",
    "In time-series analysis, a black-swan event is an observation that not only lies outside of our normal distribution of incoming data, but that also has a profound effect on the performance of the model thereafter. Perhaps we are modeling a stock that is pretty stable, growing at a steady 5% per year, with a moderate amount of day-to-day fluctuation. If there is some sort of black swan event, (perhaps the factory is hit by a meteor), this could cause a very unusual one-day change in stock price. While the damage to the factory might be quickly reparied, the disruption to our model may reverberate for much longer, perhaps even casting our stable, growing stock into an entirely different pattern.\n",
    "\n",
    "The problem here, is that in time-series analysis, our target for one day becomes a feature for following days. And, depending on the structure of our model, it may retain a memory of these events for a long period after.\n",
    "\n",
    "Another type of disurption to our model might be more planned, and more predictable than a meteor shower, but still has long-term effect on our model.  For example, if we have scheduled a big sales event (e.g. Black Friday), we might expect to see a lull in sales for the week before, and the week after, with a large spike on the day of the event. \n",
    "\n",
    "## Correlation Between Predictor Variables\n",
    "\n",
    "In our housing model, we have indentified that there is some strong co-linearity between the features. Houses generally have about the same number of bedrooms and bathrooms, and the square footage can also be expected to increase in proportion. If we have relatively few features, this won't be much of a problem, but as our features space grows, we may find it advantageous to simplify our features space by using PCA to combine these features. Rather than using three features (bedrooms, bathrooms, sqft) to describe our houses, we might find it more convenient to use a single vector (small, medium, large), which contains about the same information. Perhaps we can add a second, orthogonal feature (fewer bathrooms than bedrooms / more bathrooms than bedrooms) to add some additional information. In this way, we can remove co-linearity from our model.\n",
    "\n",
    "In time-series analysis, that relationship between features is perhaps not something we want to eliminate, but rather that is the most informative portion of our model. \n",
    "\n",
    "\n",
    "## Difficulty of Dealing with Exogenous Variables\n",
    "\n",
    "Another difference is how we handle exogenous variables. In our housing model, we might start with three simple predictors, but as we learn more about our data set, we would not hesistate to evaluate and include many other additional features (age of house, zip code, size of garage, size of lot, proximity to public transportation, etc, etc, etc.) We can think of our $X$ data set growing horizontally without bound, and without much difficulty.\n",
    "\n",
    "In its simplist form, time series analysis seeks to make predictions from naught but lagged values of the target value. We would certainly like to include any other relevant factors, but these can be harder to identify and get into a form that works with our model. One approach is to include a calendar file that marks significant, scheduled events (Black Friday, Memorial Day Sale, Back-to-School events, quarterly earnings reports, annual unveiling on new products, etc.)  But other factors can be harder to model. We could include reported revenue and expenses and other factors (such as the cost of borrowing money) that will have regular, predictable impact on the stock price. But other factors, such as poor product reviews or drunken tweets by the CEO, can be harder to get into a columnar format. It is harder to grow our data set horizontally because we would need so many *ad hoc* features to describe these events.\n",
    "\n",
    "\n",
    "## Predicting Well into the Future\n",
    "\n",
    "A final problem is how we define the success of a model. For our hosing predictor, we hope to predict a single value: the price of a house. For a time-series model, we often might want to make a series of predictions. No sooner do we make a stock price predictor that gives good predictions for tomorrow than we also want to predict for the day after tomorrow and the day after that, and a week, a month and a year into the future. If traditional ML techniques are applied, we may see predictions diverge well beyond the range of reasonable values as we push farther into the future.\n",
    "\n",
    "Naturally, the farther we push our predictions, the more our errors and uncertainties accumulate, so often we will be pushing our models to their limits and we will need better ways to talk about the errors and confidence intervals aound our predictions to better communicate what we have achived."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "Time series analysis may look like an analogue of classical supervised ML, but the inherent differences in the structure of the problem introduce a number of difficulties that make conventional ML techniques less useful. It is perhaps better to think of time series analysis as more of an exercise in pattern-matching than prediction.\n",
    "\n",
    "Next, we will introduce the specialized techniques we use for time series prediction. These fall into two main categories:\n",
    "* Techniques that try to replicate trends, changes and reversion as statistical processes (MA, AR and ARIMA models)\n",
    "* Techniques that try to recognize patterns and events that trigger certain responses (RNN, LSTM, GRU, Convolutional NN)\n",
    "\n",
    "At the extreme end, we have deep learning models such as WaveNet and transformers which try to learn from patterns over a range of time scales and incldue complex interactions.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
 }
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Time Series Analysis\n",
	"\n",
	"## Part 1 - Why Time Series is Weird and Hard\n",
	"\n",
	"## Introduction\n",
	"\n",
	"Time series analysis may look simple, but it is deceptively difficult. It seems like everyone who has written about it has included a warning that it's as much an art as a science. There is no sure path to success, that some of the common techniques require a fair amount of guesswork and that some of the common tools require interpretation. Before we delve into the techniques of time series analysis, let's spend some time understanding how it is different from traditional ML and why this requires different approaches.\n",
	"\n",
	"Time-series analysis is the practice of predicting future values, based on a history of previous values. A typical time-series question might be, \"Given the history of a stock price, can you tell me what it will be tomorrow, next week, or next month?\"\n",
	"\n",
	"Time series prediction is part of the larger topic of supervised machine learning. Superficially, it appears to be similar to other supervised learning techniques. We have a set of known exogenous variables ($X$), a set of known targets ($\\vec y$) and we are looking for any algorithm that faithfully maps $X$ to $\\vec y$.\n",
	"\n",
	"The Data Science student, having learnined the \"classical\" ML techniques (linear regression, random forest, k-NN, etc) and having observed that they all give reasonable results to \"typical\" ML problems, might be tempted to approach time-series prediction using the same techniques. However they will find that these often give terribly unhelpful results.\n",
	"\n",
	"So, before we delve into the topic of how to approach time-series prediction, I thought it would be helpful to spend some time discussing how time-series is different than classical ML, and why it requires a variety of different algorithms."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Similarities to Classical ML\n",
	"\n",
	"First, let's defined what I call a \"classical\" supervised ML problem and discuss how time-series may appear to be similar.\n",
	"\n",
	"An example I often use when teaching ML is a small data set of houses for which we know the number of bedrooms, bathrooms and square-footage, along with the sales price of each house.\n",
	"\n",
	"![](images/houses.png)\n",
	"\n",
	"In this data set, our $X$ matrix is a 2-D array of the predictors (bedrooms, bathrooms, sqft) and our $\\vec y$ is the 1-D vector of prices. \n",
	"\n",
	"The astute observer will notice that there is a strong correlation between the predictors (i.e, there are usually about the same number of bedrooms and bathrooms, and square footage generally trends up as these increase). So, there is some reason to think that recasting this data into an new feature space (using PCA) would simplify and improve the model. However, even without this step, reasonable results can be obtained using even very simple (unregularized, un-tuned models).\n",
	"\n",
	"In contrast, a simple time-series problem often has just a single feature (usually as date-time), and the target (in this case, let's say a stock price). We can cast this data into the shape of a $X$-$\\vec y$ shape by enginnering new features of the prices from the previous several days.\n",
	"\n",
	"![](images/ts1.png)\n",
	"\n",
	"This gets the problem into a shape we are familiar with, but often will not lead to good results. Let's idenftify some of the ways these two problems are different.\n",
	"\n",
	"## Working at the Edge of the Data\n",
	"\n",
	"In our house price predictor, we may have a training data set that contains a large number of observations of houses with 2-4 bedrooms and a similar number of bathrooms. Most of the predictions we will be asked to make will fall somewhere in that range. Even if we are occasionally asked to predicted on a house with 8 or 9 bedrooms, we can expect that we will see more of those 2-4 bedroom houses quite soon.\n",
	"\n",
	"In contrast, when doing time series analysis, we are usually interested in predicting values beyond the range of our existing data set, that is to say, in the future. Predictions from the period where we have training data are usually not helpful. A model that tells us to by stock just before a massive run-up is not helpful if the prediction comes after the opportunity has passed, no matter how accurate the \"prediction\" is.\n",
	"\n",
	"Some methods, such as KNN and Tree-Based Models flatline beyond the range of available data. Any prediction of the future would just be the last value repeated. Linear models will tend to drift off monotonically. \n",
	"\n",
	"To effectively work with time-series data, we need to use algorithms that extrapolate, rather just interpolate.\n",
	"\n",
	"\n",
	"## Response to Black Swan Events\n",
	"\n",
	"In our housing example, if we are tasked to make a prediction for a 1-bedroom, 12-bathroom house, we are likely to think that it's a very oddly designed house, and we will notice that its distribution of values lies well outside the normal population of houses that our model is built on and effective at predicting. So, there's a chance that our prediction won't be very good for that one house, but if the rest of the houses we are asked to predict on reverts to our normal pattern of 2-5 bedroom houses, the damage, that is to say, the one bad prediction, is contained. We can say that this unusual house is an outlier, but not a black swan.\n",
	"\n",
	"In time-series analysis, a black-swan event is an observation that not only lies outside of our normal distribution of incoming data, but that also has a profound effect on the performance of the model thereafter. Perhaps we are modeling a stock that is pretty stable, growing at a steady 5% per year, with a moderate amount of day-to-day fluctuation. If there is some sort of black swan event, (perhaps the factory is hit by a meteor), this could cause a very unusual one-day change in stock price. While the damage to the factory might be quickly reparied, the disruption to our model may reverberate for much longer, perhaps even casting our stable, growing stock into an entirely different pattern.\n",
	"\n",
	"The problem here, is that in time-series analysis, our target for one day becomes a feature for following days. And, depending on the structure of our model, it may retain a memory of these events for a long period after.\n",
	"\n",
	"Another type of disurption to our model might be more planned, and more predictable than a meteor shower, but still has long-term effect on our model. For example, if we have scheduled a big sales event (e.g. Black Friday), we might expect to see a lull in sales for the week before, and the week after, with a large spike on the day of the event. \n",
	"\n",
	"## Correlation Between Predictor Variables\n",
	"\n",
	"In our housing model, we have indentified that there is some strong co-linearity between the features. Houses generally have about the same number of bedrooms and bathrooms, and the square footage can also be expected to increase in proportion. If we have relatively few features, this won't be much of a problem, but as our features space grows, we may find it advantageous to simplify our features space by using PCA to combine these features. Rather than using three features (bedrooms, bathrooms, sqft) to describe our houses, we might find it more convenient to use a single vector (small, medium, large), which contains about the same information. Perhaps we can add a second, orthogonal feature (fewer bathrooms than bedrooms / more bathrooms than bedrooms) to add some additional information. In this way, we can remove co-linearity from our model.\n",
	"\n",
	"In time-series analysis, that relationship between features is perhaps not something we want to eliminate, but rather that is the most informative portion of our model. \n",
	"\n",
	"\n",
	"## Difficulty of Dealing with Exogenous Variables\n",
	"\n",
	"Another difference is how we handle exogenous variables. In our housing model, we might start with three simple predictors, but as we learn more about our data set, we would not hesistate to evaluate and include many other additional features (age of house, zip code, size of garage, size of lot, proximity to public transportation, etc, etc, etc.) We can think of our $X$ data set growing horizontally without bound, and without much difficulty.\n",
	"\n",
	"In its simplist form, time series analysis seeks to make predictions from naught but lagged values of the target value. We would certainly like to include any other relevant factors, but these can be harder to identify and get into a form that works with our model. One approach is to include a calendar file that marks significant, scheduled events (Black Friday, Memorial Day Sale, Back-to-School events, quarterly earnings reports, annual unveiling on new products, etc.) But other factors can be harder to model. We could include reported revenue and expenses and other factors (such as the cost of borrowing money) that will have regular, predictable impact on the stock price. But other factors, such as poor product reviews or drunken tweets by the CEO, can be harder to get into a columnar format. It is harder to grow our data set horizontally because we would need so many ad hoc features to describe these events.\n",
	"\n",
	"\n",
	"## Predicting Well into the Future\n",
	"\n",
	"A final problem is how we define the success of a model. For our hosing predictor, we hope to predict a single value: the price of a house. For a time-series model, we often might want to make a series of predictions. No sooner do we make a stock price predictor that gives good predictions for tomorrow than we also want to predict for the day after tomorrow and the day after that, and a week, a month and a year into the future. If traditional ML techniques are applied, we may see predictions diverge well beyond the range of reasonable values as we push farther into the future.\n",
	"\n",
	"Naturally, the farther we push our predictions, the more our errors and uncertainties accumulate, so often we will be pushing our models to their limits and we will need better ways to talk about the errors and confidence intervals aound our predictions to better communicate what we have achived."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Conclusion\n",
	"\n",
	"Time series analysis may look like an analogue of classical supervised ML, but the inherent differences in the structure of the problem introduce a number of difficulties that make conventional ML techniques less useful. It is perhaps better to think of time series analysis as more of an exercise in pattern-matching than prediction.\n",
	"\n",
	"Next, we will introduce the specialized techniques we use for time series prediction. These fall into two main categories:\n",
	"* Techniques that try to replicate trends, changes and reversion as statistical processes (MA, AR and ARIMA models)\n",
	"* Techniques that try to recognize patterns and events that trigger certain responses (RNN, LSTM, GRU, Convolutional NN)\n",
	"\n",
	"At the extreme end, we have deep learning models such as WaveNet and transformers which try to learn from patterns over a range of time scales and incldue complex interactions.\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.5"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 4
	}