If you need a docker image for this purpose, this should do the job.
from the spark docker container create a bash script:
touch arctic.bash
In file copy the following with the variables with the right values:
export DREMIO_TENANT={arctic project id}
export NESSIE_TOKEN={personal token from Dremio cloud}
export AWS_SECRET_ACCESS_KEY={aws secret key}
export AWS_ACCESS_KEY_ID={asw access key}
export ARCTIC_URI=https://nessie.dremio.cloud/v1/projects/xxxxxxxxxxxxxxxxxxxxxxxxxxxx
export WAREHOUSE={s3 location where arctic data is stored in your s3, no trailing slash like this: s3a://my-bucket/arctic }
# add AWS dependnecy
AWS_SDK_VERSION=2.15.40
AWS_MAVEN_GROUP=software.amazon.awssdk
AWS_PACKAGES=(
"bundle"
"url-connection-client"
)
for pkg in "${AWS_PACKAGES[@]}"; do
DEPENDENCIES+=",$AWS_MAVEN_GROUP:$pkg:$AWS_SDK_VERSION"
done
spark-shell --packages org.apache.iceberg:iceberg-spark3-runtime:0.13.1,com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.2,org.projectnessie:nessie-spark-extensions:0.20.1 \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.projectnessie.spark.extensions.NessieSparkSessionExtensions \
--conf spark.sql.catalog.arctic=org.apache.iceberg.spark.SparkCatalog \
--conf spark.sql.catalog.arctic.warehouse=$WAREHOUSE \
--conf spark.sql.catalog.arctic.catalog-impl=org.apache.iceberg.nessie.NessieCatalog \
--conf spark.sql.catalog.arctic.uri=$ARCTIC_URI \
--conf spark.sql.catalog.arctic.ref=main \
--conf spark.sql.catalog.arctic.authentication.type=BEARER \
--conf spark.sql.catalog.arctic.authentication.token=$NESSIE_TOKEN \
--conf spark.sql.catalog.arctic.cache-enabled=false
*make sure the warehouse s3 address does not have a trailing slash and the s3a prefix
Then run the script
source arctic.bash
The catalog will be called spark so creating a table would look like so:
spark.sql("CREATE TABLE arctic.table (name string) USING iceberg")
or creating a branch
CREATE BRANCH etl IN arctic