Last active
March 11, 2021 22:55
-
-
Save arccoder/200934208b263ed034e1b43ccb79798a to your computer and use it in GitHub Desktop.
Pandas: Group time series data https://arccoder.medium.com/pandas-group-time-series-data-d889e4bc0657
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import yfinance as yf | |
data = yf.download('TSLA', start='2021-02-21', end='2021-02-27', interval='1m') | |
data = data.iloc[:, 0:1] | |
data.to_csv('in/tsla-7day-1min.csv') | |
data = yf.download('TSLA', start='2021-02-01', end='2021-02-27', interval='30m') | |
data = data.iloc[:, 0:1] | |
data.to_csv('in/tsla-1month-30min.csv') | |
data = yf.download('TSLA', start='2020-03-01', end='2021-02-27', interval='60m') | |
data = data.iloc[:, 0:1] | |
data.to_csv('in/tsla-1year-60min.csv') | |
data = yf.download('TSLA', end='2021-02-27', interval='1d') | |
data = data.iloc[:, 0:1] | |
data.to_csv('in/tsla-max-1day.csv') | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import datetime | |
import pandas as pd | |
def subsetLastNDays(df, days: int, column=None): | |
""" | |
Create a subset of the df for the number of days from the date in the last row | |
""" | |
if column is None: | |
end_date = df.index[-1] | |
start_date = end_date - datetime.timedelta(days=days) | |
outdf = df[(df.index > start_date) & (df.index <= end_date)] | |
else: | |
end_date = df[column].iloc[-1] | |
start_date = end_date - datetime.timedelta(days=days) | |
outdf = df[(df[column] > start_date) & (df[column] <= end_date)] | |
return outdf | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-7day-1min.csv', parse_dates=['Datetime']) | |
# Subset the dataframe for the last day | |
out_data = subsetLastNDays(in_data, 1, 'Datetime') | |
out_data.to_csv('out/tsla-1day-1min.csv', index=False) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-1month-30min.csv', parse_dates=['Datetime']) | |
# Group in 1 day intervals | |
out_data = in_data.groupby(pd.Grouper(key='Datetime', freq="1D")).mean() | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
# Subset the dataframe for the last 30 days | |
out_data = subsetLastNDays(out_data, 28) | |
out_data.to_csv('out/tsla-1month-1day.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-1year-60min.csv', parse_dates=['Datetime']) | |
# Group in 1 day intervals | |
out_data = in_data.groupby(pd.Grouper(key='Datetime', freq="1B")).mean() | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
# Subset the dataframe for the last 1 year | |
out_data = subsetLastNDays(out_data, 364) | |
out_data.to_csv('out/tsla-1year-1day.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-7day-1min.csv', parse_dates=['Datetime']) | |
# Group in 30 min intervals | |
out_data = in_data.groupby(pd.Grouper(key='Datetime', freq="30T")).mean() # 'Datetime' column is now the index | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
# Subset the dataframe for the last 5 days | |
out_data = subsetLastNDays(out_data, 5) | |
out_data.to_csv('out/tsla-5day-30min.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-max-1day.csv', parse_dates=['Date']) | |
# Group in 1 week intervals | |
out_data = in_data.groupby(pd.Grouper(key='Date', freq="1W")).mean() # Average a week and set for next monday | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
# Subset the dataframe for the last 5 year | |
out_data = subsetLastNDays(out_data, 5 * 365) | |
out_data.to_csv('out/tsla-5year-1week.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-1year-60min.csv', parse_dates=['Datetime']) | |
# Group in 1 day intervals | |
out_data = in_data.groupby(pd.Grouper(key='Datetime', freq="1D")).mean() | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
out_data.to_csv('out/tsla-6month-1day.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in_data = pd.read_csv('in/tsla-max-1day.csv', parse_dates=['Date']) | |
# Group in 1 month intervals | |
out_max1M = in_data.groupby(pd.Grouper(key='Date', freq="1M")).mean() # Last date of month | |
# out_data = in_data.groupby(pd.Grouper(key='Date', freq="1MS")).mean() # First date of month | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
out_data.to_csv('out/tsla-max-1month.csv') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# YTD | |
in_data = pd.read_csv('in/tsla-1year-60min.csv', parse_dates=['Datetime']) | |
# Group in 1 day intervals | |
out_data = in_data.groupby(pd.Grouper(key='Datetime', freq="24H")).mean() | |
out_data = out_data.dropna(axis=0, subset=['Open']) | |
# Subset the dataframe for the last 30 days | |
out_data = subsetLastNDays(out_data, 57) | |
out_data.to_csv('out/tsla-ytd-1day.csv') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment