Skip to content

Instantly share code, notes, and snippets.

@dacort
Created February 7, 2023 18:47
Show Gist options
  • Save dacort/5773640966016d12dc958b1bd0a820a1 to your computer and use it in GitHub Desktop.
Save dacort/5773640966016d12dc958b1bd0a820a1 to your computer and use it in GitHub Desktop.
Reading Athena views from Spark
  1. Download the JDBC driver from here: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html - I used the JDBC driver with the Athena SDK, AthenaJDBC42-2.0.35.1000.jar.

  2. Start pyspark with the --jars option.

pyspark --jars AthenaJDBC42-2.0.35.1000.jar
  1. Use spark.read.jdbc to connect to Athena. You need to specify either a User/Password in the properties or set the AwsCredentialsProviderClass property.
df = spark.read.jdbc(
    "jdbc:awsathena://AwsRegion=us-east-1;S3OutputLocation=s3://<BUCKET_NAME>/athena_results/",
    "database_name.view_name",
    properties={
        "driver": "com.simba.athena.jdbc.Driver",
        "AwsCredentialsProviderClass": "com.simba.athena.amazonaws.auth.DefaultAWSCredentialsProviderChain",
    },
)
df.head()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment