Skip to content

Instantly share code, notes, and snippets.

@VioletVivirand
Last active October 8, 2024 11:49
Show Gist options
  • Save VioletVivirand/a97547b17d28f68b6f5da1d29171d0a7 to your computer and use it in GitHub Desktop.
Save VioletVivirand/a97547b17d28f68b6f5da1d29171d0a7 to your computer and use it in GitHub Desktop.
Get multiple files from S3 at a time into a single Pandas DataFrame with AWS SDK for Pandas (awswrangler)
import awswrangler as wr
# Read multiple JSONs
# Paths are: s3://<bucket>/yyyy/mm/dd/filename.json
# Ref: https://aws-sdk-pandas.readthedocs.io/en/stable/tutorials/003%20-%20Amazon%20S3.html#2.3.2-Reading-JSON-by-prefix
df = wr.s3.read_json(f"s3://<bucket>/prefix/", lines=True).reset_index(drop=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment