A simple Scala/Spark application to read Delta tables from Azure Blob Storage using ABFSS protocol and Service Principal authentication.
For read-only operations from Azure Blob Storage, winutils.exe is typically not required since Spark uses the ABFS driver directly rather than the local Hadoop filesystem.
Try running without it first. If you encounter "Could not locate executable null\bin\winutils.exe" errors, then set up winutils:
- Download winutils.exe for Hadoop 3.3.x from: https://github.com/cdarlint/winutils
- Create a directory:
C:\hadoop\bin - Place
winutils.exeinC:\hadoop\bin - Set environment variable:
HADOOP_HOME=C:\hadoop - Add
%HADOOP_HOME%\binto your PATH
Edit src/main/scala/com/example/DeltaAzureReader.scala and replace the placeholder values:
// Azure Storage Account details
val storageAccountName = "your_storage_account_name"
val containerName = "your_container_name"
val deltaTablePath = "path/to/delta/table"
// Service Principal credentials
val tenantId = "your_tenant_id"
val clientId = "your_client_id"
val clientSecret = "your_client_secret"- In Azure Portal, go to Azure Active Directory > App registrations
- Create a new registration or use existing
- Note the Application (client) ID and Directory (tenant) ID
- Create a client secret under Certificates & secrets
- Grant the Service Principal access to your storage account:
- Go to your Storage Account > Access Control (IAM)
- Add role assignment: Storage Blob Data Reader (or Contributor)
mvn clean packageThis creates an uber-jar in target/delta-azure-reader-1.0.0.jar
mvn exec:java -Dexec.mainClass="com.example.DeltaAzureReader"java -jar target/delta-azure-reader-1.0.0.jarspark-submit --class com.example.DeltaAzureReader target/delta-azure-reader-1.0.0.jar-
"Could not locate executable null\bin\winutils.exe"
- Ensure HADOOP_HOME is set correctly
- Verify winutils.exe exists in %HADOOP_HOME%\bin
-
Authentication errors
- Verify Service Principal credentials
- Ensure the Service Principal has correct RBAC roles on the storage account
- Check if the OAuth endpoint URL is correct
-
Delta table not found
- Verify the ABFSS path is correct
- Ensure the container and path exist
- Check if the Delta table has valid _delta_log directory
-
Memory issues
- Increase driver memory: add
-Dspark.driver.memory=8gto Java options
- Increase driver memory: add
delta-azure-reader/
├── pom.xml
├── README.md
└── src/
└── main/
└── scala/
└── com/
└── example/
└── DeltaAzureReader.scala