Skip to content

Instantly share code, notes, and snippets.

@jeongyoonlee
Created August 15, 2014 19:40
Show Gist options
  • Save jeongyoonlee/cd518179672b165dc859 to your computer and use it in GitHub Desktop.
Save jeongyoonlee/cd518179672b165dc859 to your computer and use it in GitHub Desktop.
resampling and timezone conversion using Pandas
import pandas as pd
def prep_log(sec_file, min_file, src_tz='US/Pacific', dst_tz='US/Eastern', datetime_fmt='%m/%d/%y %H:%M'):
"""Preprocess a second level log file by aggregating it in a minute level and converting timezone if necessary.
Args:
sec_file: a second level CSV log file with timestamps in the first column
min_file: a minute level CSV output log file with timestamps in the first column
src_tz: a source timezone (default: EST)
dst_tz: a destination timezone (default: PST)
datetime_fmt: a datetime format (default:m/d/yy HH:MM)
Return:
save an output file as min_file
"""
# load a second level CSV file. parse the datetime from the first column.
df = pd.read_csv(sec_file, index_col=0, parse_dates=True)
# resample the data by minutes and fill NAs with 0s.
df_min = df.resample('Min', how='sum')
df_min.fillna(0, inplace=True)
# if necessary, convert its timezone
if src_tz != dst_tz:
df_min.index = df_min.index.tz_localize(src_tz).tz_convert(dst_tz)
# save the minute level data as a CSV file with the datetime format given
df_min.to_csv(min_file, date_format=datetime_fmt)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment