Skip to content

Instantly share code, notes, and snippets.

@hakanilter
Last active October 14, 2018 23:15
Show Gist options
  • Save hakanilter/0224e4a29c9b3a7d109458c94a4198af to your computer and use it in GitHub Desktop.
Save hakanilter/0224e4a29c9b3a7d109458c94a4198af to your computer and use it in GitHub Desktop.
Create Spark DataFrame from Azure Blob Storage
/*
Add following dependencies:
com.microsoft.azure:azure-storage:2.0.0
org.apache.hadoop:hadoop-azure:2.7.3
Exclude:
com.fasterxml.jackson.core:*:*
*/
spark.conf.set(
"fs.azure.account.key.<your-storage-account-name>.blob.core.windows.net",
"<your-storage-account-access-key>")
sc.hadoopConfiguration.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
sc.hadoopConfiguration.set("fs.AbstractFileSystem.wasb.Impl", "org.apache.hadoop.fs.azure.Wasb")
val df = spark.read.parquet("wasbs://<your-container-name>@<your-storage-account-name>.blob.core.windows.net/datapyro/items")
df.printSchema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment