-
Download the JDBC driver from here: https://docs.aws.amazon.com/athena/latest/ug/connect-with-jdbc.html - I used the JDBC driver with the Athena SDK,
AthenaJDBC42-2.0.35.1000.jar
. -
Start
pyspark
with the--jars
option.
pyspark --jars AthenaJDBC42-2.0.35.1000.jar
- Use
spark.read.jdbc
to connect to Athena. You need to specify either a User/Password in the properties or set theAwsCredentialsProviderClass
property.
df = spark.read.jdbc(
"jdbc:awsathena://AwsRegion=us-east-1;S3OutputLocation=s3://<BUCKET_NAME>/athena_results/",
"database_name.view_name",
properties={
"driver": "com.simba.athena.jdbc.Driver",
"AwsCredentialsProviderClass": "com.simba.athena.amazonaws.auth.DefaultAWSCredentialsProviderChain",
},
)
df.head()