Created
July 26, 2018 12:58
-
-
Save monocongo/6e0df19c9dd845f3f465a9a6ccfcef37 to your computer and use it in GitHub Desktop.
Transform multivariate time series forecasting problems into supervised learning problems (Pandas DataFrame transformation)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
def transform_to_supervised(df, | |
previous_steps=1, | |
forecast_steps=1, | |
dropnan=True): | |
""" | |
Transforms a DataFrame containing time series data into a DataFrame | |
containing data suitable for use as a supervised learning problem. | |
Derived from code originally found at | |
https://machinelearningmastery.com/convert-time-series-supervised-learning-problem-python/ | |
:param df: pandas DataFrame object containing columns of time series values | |
:param previous_steps: the number of previous steps that will be included in the | |
output DataFrame corresponding to each input column | |
:param forecast_steps: the number of forecast steps that will be included in the | |
output DataFrame corresponding to each input column | |
:return Pandas DataFrame containing original columns, renamed <orig_name>(t), as well as | |
columns for previous steps, <orig_name>(t-1) ... <orig_name>(t-n) and columns | |
for forecast steps, <orig_name>(t+1) ... <orig_name>(t+n) | |
""" | |
# original column names | |
col_names = df.columns | |
# list of columns and corresponding names we'll build from | |
# the originals found in the input DataFrame | |
cols, names = list(), list() | |
# input sequence (t-n, ... t-1) | |
for i in range(previous_steps, 0, -1): | |
cols.append(df.shift(i)) | |
names += [('%s(t-%d)' % (col_name, i)) for col_name in col_names] | |
# forecast sequence (t, t+1, ... t+n) | |
for i in range(0, forecast_steps): | |
cols.append(df.shift(-i)) | |
if i == 0: | |
names += [('%s(t)' % col_name) for col_name in col_names] | |
else: | |
names += [('%s(t+%d)' % (col_name, i)) for col_name in col_names] | |
# put all the columns together into a single aggregated DataFrame | |
agg = pd.concat(cols, axis=1) | |
agg.columns = names | |
# drop rows with NaN values | |
if dropnan: | |
agg.dropna(inplace=True) | |
return agg |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you Mr. @monocongo,
This will be an addition to my learnings. as I am at a beginner level and struggling with the online information to learn from.