Skip to content

Instantly share code, notes, and snippets.

@seesharprun
Last active April 15, 2025 15:09
Show Gist options
  • Save seesharprun/77dc9296b61d5477dc40660031ef0065 to your computer and use it in GitHub Desktop.
Save seesharprun/77dc9296b61d5477dc40660031ef0065 to your computer and use it in GitHub Desktop.
Spark & Managed Identities
# The host can infer the managed identity since it's system-assigned.
config = {
"spark.cosmos.accountEndpoint" -> "<azure-cosmos-db-nosql-account-endpoint>",
"spark.cosmos.auth.type" -> "ManagedIdentity",
"spark.cosmos.account.subscriptionId": "<subscription-id>",
"spark.cosmos.account.tenantId": "<tenant-id>",
"spark.cosmos.account.resourceGroupName": "<resource-group-name>",
"spark.cosmos.database" -> "<database-name>",
"spark.cosmos.container" -> "<container-name>"
)
cosmos_df = spark.read.format("cosmos.oltp") \
.options(**config) \
.option("spark.cosmos.read.inferSchema.enabled", "true") \
.load()
cosmos_df.createOrReplaceTempView("items")
# Multiple user-assigned managed identities can be assigned to a single resource.
# You must give the client a hint to which principal to use.
# The convention is to store this value in an AZURE_CLIENT_ID environment variable.
config = {
"spark.cosmos.accountEndpoint" -> "<azure-cosmos-db-nosql-account-endpoint>",
"spark.cosmos.auth.type" -> "ManagedIdentity",
"spark.cosmos.auth.aad.clientId" -> "<managed-identity-client-id>",
"spark.cosmos.account.subscriptionId": "<subscription-id>",
"spark.cosmos.account.tenantId": "<tenant-id>",
"spark.cosmos.account.resourceGroupName": "<resource-group-name>",
"spark.cosmos.database" -> "<database-name>",
"spark.cosmos.container" -> "<container-name>"
)
cosmos_df = spark.read.format("cosmos.oltp") \
.options(**config) \
.option("spark.cosmos.read.inferSchema.enabled", "true") \
.load()
cosmos_df.createOrReplaceTempView("items")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment