Skip to content

Instantly share code, notes, and snippets.

@yassineAlouini
Created June 30, 2017 09:05
Show Gist options
  • Save yassineAlouini/86b19268b61c06df4e5c38dfb3e2e21c to your computer and use it in GitHub Desktop.
Save yassineAlouini/86b19268b61c06df4e5c38dfb3e2e21c to your computer and use it in GitHub Desktop.
Get zipped (with bz2) CSV files from S3 into a Pandas DataFrame.
import s3fs
import pandas as pd
def get_csv_from_s3(folder_path):
dfs = []
s3 = s3fs.S3FileSystem()
for fp in s3.ls(folder_path):
if '.csv' in fp:
with s3.open(fp) as s3f:
with bz2file.open(s3f) as f:
tmp_df = pd.read_csv(f, encoding='utf-8', sep=';')
dfs.append(tmp_df)
if len(dfs) > 0:
df = pd.concat(dfs)
return df
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment