Skip to content

Instantly share code, notes, and snippets.

@AlexMercedCoder
Last active April 8, 2022 20:56
Show Gist options
  • Save AlexMercedCoder/02eed46799db324db88b17ac82ddde3a to your computer and use it in GitHub Desktop.
Save AlexMercedCoder/02eed46799db324db88b17ac82ddde3a to your computer and use it in GitHub Desktop.
How to Connect to Dremio Arctic using Spark in a Docker container

How to Connect to Dremio Arctic with Spark in a Docker Container

If you need a docker image for this purpose, this should do the job.

from the spark docker container create a bash script:

touch arctic.bash

In file copy the following with the variables with the right values:

export DREMIO_TENANT={arctic project id}
export NESSIE_TOKEN={personal token from Dremio cloud}
export AWS_SECRET_ACCESS_KEY={aws secret key}
export AWS_ACCESS_KEY_ID={asw access key}
export ARCTIC_URI=https://nessie.dremio.cloud/v1/projects/xxxxxxxxxxxxxxxxxxxxxxxxxxxx
export WAREHOUSE={s3 location where arctic data is stored in your s3, no trailing slash like this: s3a://my-bucket/arctic }

# add AWS dependnecy
AWS_SDK_VERSION=2.15.40
AWS_MAVEN_GROUP=software.amazon.awssdk
AWS_PACKAGES=(
    "bundle"
    "url-connection-client"
)
for pkg in "${AWS_PACKAGES[@]}"; do
    DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
done

spark-shell --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.1,com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.2,org.projectnessie:nessie-spark-extensions:0.20.1 \
 --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions  \
 --conf spark.sql.catalog.arctic=org.apache.iceberg.spark.SparkCatalog \
 --conf spark.sql.catalog.arctic.warehouse=$WAREHOUSE \
 --conf spark.sql.catalog.arctic.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
 --conf spark.sql.catalog.arctic.uri=$ARCTIC_URI \
 --conf spark.sql.catalog.arctic.ref=main \
 --conf spark.sql.catalog.arctic.authentication.type=BEARER \
 --conf spark.sql.catalog.arctic.authentication.token=$NESSIE_TOKEN \
 --conf spark.sql.catalog.arctic.cache-enabled=false

*make sure the warehouse s3 address does not have a trailing slash and the s3a prefix

Then run the script

source arctic.bash

The catalog will be called spark so creating a table would look like so:

spark.sql("CREATE TABLE arctic.table (name string) USING iceberg")

or creating a branch

CREATE BRANCH etl IN arctic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment