Skip to content

Instantly share code, notes, and snippets.

@tomsing1
Created August 21, 2024 19:54
Show Gist options
  • Save tomsing1/82b90366df310ddb181667833102eca8 to your computer and use it in GitHub Desktop.
Save tomsing1/82b90366df310ddb181667833102eca8 to your computer and use it in GitHub Desktop.
Wrangling files on AWS S3 with the awswrangler python module (in Colab)
# Commented out IPython magic to ensure Python compatibility.
# %%capture
# !pip install awswrangler --quiet
import awswrangler as wr
import pandas as pd
import boto3
from google.colab import data_table, userdata
data_table.enable_dataframe_formatter()
# refresh the temporary credentials via the key symbol in the left hand tab
boto3.setup_default_session(
region_name="us-east-2",
aws_access_key_id=userdata.get('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=userdata.get('AWS_SECRET_ACCESS_KEY'),
aws_session_token=userdata.get('AWS_SESSION_TOKEN')
)
path = 's3://path/do/prefix/with/partitioned/parquet/dataset'
df = wr.s3.read_parquet(path, dataset=True)
df.head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment