Skip to content

Instantly share code, notes, and snippets.

@samukasmk
Created February 20, 2025 14:42
Show Gist options
  • Save samukasmk/42c84dcef81ef4e572493be7439425e5 to your computer and use it in GitHub Desktop.
Save samukasmk/42c84dcef81ef4e572493be7439425e5 to your computer and use it in GitHub Desktop.
Pandas (read parquet files from s3)

Pandas examples of read_parquet files from s3

Credentials (from enviroment variables)

import os
import pandas as pd

# define aws credentials by os enviroment variables
os.environ['AWS_ACCESS_KEY_ID'] = '...'
os.environ['AWS_SECRET_ACCESS_KEY'] = '...'
os.environ['AWS_DEFAULT_REGION'] = '...'

df = pd.read_parquet("s3://your-bucket/my-sample-data/20250220/")

Credentials (from storage)

import pandas as pd

aws_access_key = '...'
aws_secret_key = '...'
df = pd.read_parquet("s3://your-bucket/my-sample-data/20250220",
                     storage_options={
                        "key": aws_access_key,
                        "secret": aws_secret_key
                     })
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment