Skip to content

Instantly share code, notes, and snippets.

@wallyhall
Last active October 26, 2024 18:18
Show Gist options
  • Save wallyhall/915fedb4dfc766b61f442a32c95e1c29 to your computer and use it in GitHub Desktop.
Save wallyhall/915fedb4dfc766b61f442a32c95e1c29 to your computer and use it in GitHub Desktop.
Apache Airflow Azure AAD SSO howto

The following instructions for enabling Azure SSO for Apache Airflow nearly take you all the way - but fall short a couple of details around the configuration of airflow itself:

https://objectpartners.com/2021/12/24/enterprise-auth-for-airflow-azure-ad

All the "Azure" instructions there can be safely followed - the resulting webserver_config.py (which can be injected into a dockerised Airflow in /opt/airflow/webserver_config.py) can be built from the following:

from __future__ import annotations

import os

from airflow.www.fab_security.manager import AUTH_OAUTH
from airflow.www.security import AirflowSecurityManager
from airflow.utils.log.logging_mixin import LoggingMixin

basedir = os.path.abspath(os.path.dirname(__file__))

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True
WTF_CSRF_TIME_LIMIT = None

AUTH_TYPE = AUTH_OAUTH

OAUTH_PROVIDERS = [{
    'name':'Microsoft Azure AD',
    'token_key':'access_token',
    'icon':'fa-windows',
    'remote_app': {
        'api_base_url': "https://login.microsoftonline.com/{}".format(os.getenv("AAD_TENANT_ID")),
        'request_token_url': None,
        'request_token_params': {
            'scope': 'openid email profile'
        },
        'access_token_url': "https://login.microsoftonline.com/{}/oauth2/v2.0/token".format(os.getenv("AAD_TENANT_ID")),
        "access_token_params": {
            'scope': 'openid email profile'
        },
        'authorize_url': "https://login.microsoftonline.com/{}/oauth2/v2.0/authorize".format(os.getenv("AAD_TENANT_ID")),
        "authorize_params": {
            'scope': 'openid email profile'
        },
        'client_id': os.getenv("AAD_CLIENT_ID"),
        'client_secret': os.getenv("AAD_CLIENT_SECRET"),
        'jwks_uri': 'https://login.microsoftonline.com/common/discovery/v2.0/keys'
    }
}]

AUTH_USER_REGISTRATION_ROLE = "Public"
AUTH_USER_REGISTRATION = True
AUTH_ROLES_SYNC_AT_LOGIN = True
AUTH_ROLES_MAPPING = {
    "airflow_prod_admin": ["Admin"],
    "airflow_prod_user": ["Op"],
    "airflow_prod_viewer": ["Viewer"]
}

class AzureCustomSecurity(AirflowSecurityManager, LoggingMixin):
    def get_oauth_user_info(self, provider, response=None):
        me = self._azure_jwt_token_parse(response["id_token"])
        return {
            "name": me["name"],
            "email": me["email"],
            "first_name": me["given_name"],
            "last_name": me["family_name"],
            "id": me["oid"],
            "username": me["preferred_username"],
            "role_keys": me["roles"]
        }

# the first of these two appears to work with older Airflow versions, the latter newer.
FAB_SECURITY_MANAGER_CLASS = 'webserver_config.AzureCustomSecurity'
SECURITY_MANAGER_CLASS = AzureCustomSecurity

The above assumes environment variables are configured for the OAuth client secret, etc - and has been tested thoroughly and confirmed working.

Note the roles need to match what you configured in Azure (the example above is using airflow_prod_user etc, in deviation to the linked article above).

@surawut-jirasaktavee
Copy link

UPDATE

I successfully implemented the Airflow custom security class by using the configmap solution or copying webserver_config.py into the container image solution.

If you have applied one of these methods, do not set config_file in the webserver section. it will override by calling the write_webserver_configuration_if_needed function from configurations.py when starting the Airflow webserver.

Remove it if you set it. This solves my problem.

@vdozal
Copy link

vdozal commented Jun 18, 2024

@surawut-jirasaktavee @drivard not sure what am I doing wrong, I also can make it work with docker, but neither the configMap nor copying the webserver_config.py to the image works, the file doesn't get picked up, here are the versions for FlaskAppBuilder and Authlib

Authlib 1.3.1
Flask-AppBuilder 4.4.1
-rw-r--r-- 1 root root 3374 Jun 18 01:00 /opt/airflow/webserver_config.py <-- the file is there and contents are as posted, only difference is mapping roles

Kubernetes 1.30 on prem, using NodePort to expose UI, it must be something simple but can't figure it out, already double check AAD_ env variables, certs, they all match, found the Flask file that the errors refer to:

{views.py:627} DEBUG - Provider: azure
{views.py:640} DEBUG - Going to call authorize for: azure
{views.py:670} DEBUG - Authorized init
{views.py:678} ERROR - Error authorizing OAuth access token: Expecting value: line 1 column 1 (char 0)
https://github.com/dpgaspar/Flask-AppBuilder/blob/release/4.4.1/flask_appbuilder/security/views.py

Any pointers appreciated, suspect that's a networking issue. Thanks

@drivard
Copy link

drivard commented Jun 18, 2024

did you configure on the webserver section the env variables:

in the chart it is :

webserver:
  env:
    - name: "AAD_TENANT_ID"
      value: "your_tenant_id"
    - name: "AAD_CLIENT_ID"
      value: "your_client_id"
    - name: "AAD_CLIENT_SECRET"
      value: "your_secret"

did you set the AIRFLOW__LOGGING__FAB_LOGGING_LEVEL to debug because if you did you should see the jwt token in your pod logs?

@vdozal
Copy link

vdozal commented Jun 18, 2024

@drivard I did, I used a secret and get them with envFrom, if I log in to container I see them correctly set, I also have the Fab Logging set, I'll try setting them directly as you mentioned

@vdozal
Copy link

vdozal commented Jun 18, 2024

@drivard Thank you, I was about to give up, setting the variables directly worked, still don't understand why but finally I can move on, appreciate your help, I owe you one.

@drivard
Copy link

drivard commented Jun 18, 2024

Glad it helped.

@ctrongminh
Copy link

ctrongminh commented Aug 13, 2024

Thank you @drivard ,
I just tested the webserver_config.py, your solution works with helm chart airflow version 1.15.0.

  1. I also added the "User": ["User"] in AUTH_ROLES_MAPPING, which also works for User role.
  2. I'm able to use the valueFrom and input value from k8s secrets to the env vars of the webserver pod. and it works
    - name: AAD_TENANT_ID
      valueFrom:
        secretKeyRef:
          name: aad
          key: aad_tenent_id
    - name: AAD_CLIENT_ID
      valueFrom:
        secretKeyRef:
          name: aad
          key: aad_client_id
    - name: AAD_CLIENT_SECRET
      valueFrom:
        secretKeyRef:
          name: aad
          key: aad_client_secret

A working version as of May 2024, with latest comments from everyone here and a comment in the code on how to map roles between Airflow roles and Microsoft Entra ID App roles. It was tested against apache/airflow:2.9.1-python3.12.

App Roles assignment

  1. Create a new App Registration in the Azure Portal

    1. Create a new Role in the App Roles blade named Admin with value Admin, you can also create the other roles you need e.g.: Op, Viewer, Public, or custom role etc. The values are taken from the AUTH_ROLES_MAPPING in the code below.
  2. Get in the Enterprise Application of your App Registration and assign the Admin role to the user or group of your choice using the Users and Groups blade

  3. Ensure that your app registration has the necessary redirect URIs configured in the Authentication blade e.g.: http://localhost:8080/oauth-authorized/azure

#
## https://gist.github.com/wallyhall/915fedb4dfc766b61f442a32c95e1c29#file-apache_airflow_sso_howto-md
#
from __future__ import annotations

import os

from airflow.www.fab_security.manager import AUTH_OAUTH
from airflow.auth.managers.fab.security_manager.override import FabAirflowSecurityManagerOverride
from airflow.utils.log.logging_mixin import LoggingMixin

basedir = os.path.abspath(os.path.dirname(__file__))

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = True
WTF_CSRF_TIME_LIMIT = None
AAD_TENANT_ID = os.getenv("AAD_TENANT_ID")
AAD_CLIENT_ID = os.getenv("AAD_CLIENT_ID")
AAD_CLIENT_SECRET = os.getenv("AAD_CLIENT_SECRET")

AUTH_TYPE = AUTH_OAUTH

OAUTH_PROVIDERS = [{
    'name':'azure',
    'token_key':'access_token',
    'icon':'fa-windows',
    'remote_app': {
        'api_base_url': f"https://login.microsoftonline.com/{AAD_TENANT_ID}",
        'request_token_url': None,
        'request_token_params': {
            'scope': 'openid email profile'
        },
        'access_token_url': f"https://login.microsoftonline.com/{AAD_TENANT_ID}/oauth2/v2.0/token",
        "access_token_params": {
            'scope': 'openid email profile'
        },
        'authorize_url': f"https://login.microsoftonline.com/{AAD_TENANT_ID}/oauth2/v2.0/authorize",
        "authorize_params": {
            'scope': 'openid email profile'
        },
        'client_id': f"{AAD_CLIENT_ID}",
        'client_secret': f"{AAD_CLIENT_SECRET}",
        'jwks_uri': 'https://login.microsoftonline.com/common/discovery/v2.0/keys'
    }
}]

AUTH_USER_REGISTRATION_ROLE = "Public"
AUTH_USER_REGISTRATION = True
AUTH_ROLES_SYNC_AT_LOGIN = True
# First you MUST create an Azure App Registration
# Secondly, you create a role like "Admin with value Admin" in the App Registration "App Roles" section in the Azure Portal under Microsoft Entra ID.
# Then you have groups and they MUST be linked from the Microsoft Entra ID "Enterprise Application" section in the Azure Portal under the "Users and Groups" section of the Enterprise Application you created.
# Each groups or users MUST be assigned a role e.g.: Admin, Op, Viewer in the "Users and Groups"
AUTH_ROLES_MAPPING = {
    "Admin": ["Admin"],
    "Op": ["Op"],
    "Viewer": ["Viewer"],
    "Public": ["Public"],
}

class AzureCustomSecurity(FabAirflowSecurityManagerOverride, LoggingMixin):
    def get_oauth_user_info(self, provider, response=None):
        self.log.debug(f"Parsing JWT token for provider : {provider}")

        try:   # the try and except are optional - strictly you only need the me= line.
            me = super().get_oauth_user_info(provider, response)
        except Exception as e:
            import traceback
            traceback.print_exc()
            self.log.debug(e)

        self.log.debug(f"Parse JWT token : {me}")
        return {
            "name": me["first_name"] + " " + me["last_name"],
            "email": me["email"],
            "first_name": me["first_name"],
            "last_name": me["last_name"],
            "id": me["username"],
            "username": me["email"],
            "role_keys": me.get("role_keys", ["Public"])
        }

# the first of these two appears to work with older Airflow versions, the latter newer.
FAB_SECURITY_MANAGER_CLASS = 'webserver_config.AzureCustomSecurity'
SECURITY_MANAGER_CLASS = AzureCustomSecurity

@mpanichella
Copy link

Hi!, I implemented this with the helm chart airflow version 1.15.0. but its not working, I tried to login, the process is completed, but on the redirection to airflow said "Invalid login. Please try again." I validated the User, The role in the user, but its not working, any idea?

@mpanichella
Copy link

I see this error

Traceback (most recent call last): File "/opt/airflow/webserver_config.py", line 66, in get_oauth_user_info me = super().get_oauth_user_info(provider, response) File "/home/airflow/.local/lib/python3.9/site-packages/airflow/auth/managers/fab/security_manager/override.py", line 2155, in get_oauth_user_info "email": me["upn"] if "upn" in me else me["email"],KeyError: 'email'

@mpanichella
Copy link

mpanichella commented Aug 14, 2024

Forgetit I saw tha the problem is the configuration on the AppReg, as part of the configuration, you need to add the claim email and upn as part of the token
image

@seniut
Copy link

seniut commented Aug 26, 2024

Could someone help with the following:
I see in the example using AAD_CLIENT_SECRET. I have deployed Airflow on AKS via Helm Chart. Is it possible to use Azure Managed Identity instead of using AAD_CLIENT_SECRET?
In Airflow, I can use Managed Identity for a specified Pod via Kubernetes ServiceAccount. As I understand, authentication to Airflow will happen on the webserver Pod, and my webserver Pod has access to Azure services via Managed Identity.
@drivard maybe you know something about it?

@drivard
Copy link

drivard commented Aug 26, 2024

Sorry @seniut I have no experience with Managed Identities.

@nmaster
Copy link

nmaster commented Sep 13, 2024

Thanks a lot for this solution! Works fine for me in a PoC scenario using kubernetes 1.28 and the official airflow helm chart for version 2.9.3. I'm using this with Azure AD. The only flaw it has for me is that JWT2 doesn't come with first_name and last_name fields, however airflow seems to expect JWT1. My current airflow/flask/python skills surely are too limited, maybe somebody can help me out? ;)

sidenote: the described solution also supports multiple oauth providers. The only thing to make sure is that name of the oauth provider oapname (in this example 'azure') matches the redirect uri https://my.airflow.host/oauth-authorized/`oapname` configured on provider side.

@Saksham-lumiq
Copy link

@drivard
I have followed all the steps, but when I access my airflow, i still see the default airflow authentication method, although the URI changes but nothing else happens, its not redirecting or letting me authenticate using sso, any more steps you followed which missed out in the above conversations?
would really appreciate if some one could help me fix this.

@nitinmahawadiwar
Copy link

nitinmahawadiwar commented Oct 24, 2024

Hello @drivard @nmaster @ctrongminh
I am using Airflow 2.9.3 with Helm 1.15
I have followed your procedure, but stuck with below error from the webserver pod.

Error authorizing OAuth access token: Invalid JSON Web Key Set.

This is what is reflecting in the UI : The request to sign in was denied

This is what I have configured in Azure Enterprise App:
Identifier (Entity ID) :: https://airflow.xyz.com/
Reply URL (Assertion Consumer Service URL) :: https://airflow.xyz.com/oauth-authorized/azure
Sign on URL :: https://airflow.xyz.com/login/
Relay State (Optional) :: https://airflow.xyz.com/home
Logout Url (Optional) :: https://airflow.xyz.com/logout

Below is my webserver_config.py

        from __future__ import annotations
        import os
        from airflow.www.fab_security.manager import AUTH_OAUTH
        # from airflow.www.security import AirflowSecurityManager
        from airflow.auth.managers.fab.security_manager.override import FabAirflowSecurityManagerOverride
        from airflow.utils.log.logging_mixin import LoggingMixin

        basedir = os.path.abspath(os.path.dirname(__file__))

        # Flask-WTF flag for CSRF
        WTF_CSRF_ENABLED = True
        WTF_CSRF_TIME_LIMIT = None
        AAD_TENANT_ID = <tenant id>
        AAD_CLIENT_ID = <APP Registration client id>
        AAD_CLIENT_SECRET = <App Registration client secret>

        AUTH_TYPE = AUTH_OAUTH

        OAUTH_PROVIDERS = [{
            'name':'azure',
            'token_key':'access_token',
            'icon':'fa-windows',
            'remote_app': {
                'api_base_url': f"https://login.microsoftonline.com/{AAD_TENANT_ID}",
                'client_kwargs': {
                    "scope": "User.read name preferred_username email profile upn",
                    "resource": f"{AAD_CLIENT_ID}",
                    # Optionally enforce signature JWT verification
                    "verify_signature": False
                },            
                'request_token_url': None,
                'request_token_params': {
                    'scope': 'openid email profile'
                },
                'access_token_url': f"https://login.microsoftonline.com/{AAD_TENANT_ID}/oauth2/v2.0/token",
                "access_token_params": {
                    'scope': 'openid email profile'
                },
                'authorize_url': f"https://login.microsoftonline.com/{AAD_TENANT_ID}/oauth2/v2.0/authorize",
                "authorize_params": {
                    'scope': 'openid email profile'
                },
                'client_id': f"{AAD_CLIENT_ID}",
                'client_secret': f"{AAD_CLIENT_SECRET}",
                'jwks_uri': 'https://login.microsoftonline.com/common/discovery/v2.0/keys',
                'redirect_uri': 'https://airflow.xyz.com/oauth-authorized/azure'            
            }
        }]

        AUTH_USER_REGISTRATION_ROLE = "Public"
        AUTH_USER_REGISTRATION = True
        AUTH_ROLES_SYNC_AT_LOGIN = True
        # First you MUST create a role like"Admin with value Admin" in the App Registration "App Roles" section in the Azure Portal under Microsoft Entra ID.
        # Then groups MUST be linked from the Microsoft Entra ID "Enterprise Application" section in the Azure Portal under the "Users and Groups" section.
        # Each groups or users MUST be assigned a role e.g.: Admin, Op, Viewer in the "Users and Groups"
        AUTH_ROLES_MAPPING = {
            "airflow_nonprod_admin": ["Admin"],
            "airflow_nonprod_op": ["Op"],
            "airflow_nonprod_viewer": ["Viewer"],
        }

        class AzureCustomSecurity(FabAirflowSecurityManagerOverride, LoggingMixin):
            def get_oauth_user_info(self, provider, response=None):
                self.log.debug(f"Parsing JWT token for provider : {provider}")

                try:   # the try and except are optional - strictly you only need the me= line.
                    me = super().get_oauth_user_info(provider, response)
                except Exception as e:
                    import traceback
                    traceback.print_exc()
                    self.log.debug(e)

                self.log.debug(f"Parse JWT token : {me}")
                return {
                    "name": me["userprincipalname"],
                    "email": me["mail"],
                    "first_name": me["givenname"],
                    "last_name": me["surname"],
                    "id": me["userprincipalname"],
                    "username": me["givenname"],
                    "role_keys": me["groups"]
                }

        # the first of these two appears to work with older Airflow versions, the latter newer.
        FAB_SECURITY_MANAGER_CLASS = 'webserver_config.AzureCustomSecurity'
        SECURITY_MANAGER_CLASS = AzureCustomSecurity

would really appreciate for your help

@vdozal
Copy link

vdozal commented Oct 25, 2024 via email

@nitinmahawadiwar
Copy link

Thank you @vdozal . This was really helpful.

I am using almost same configuration except for the environment variables (but I am ignoring it for now.) .
Can you please help me to understand an issue with role_keys given below?

Let me explain the flow. un-wanted code is removed for clarity.

....
AUTH_USER_REGISTRATION_ROLE = "Public"
....
AUTH_ROLES_MAPPING = {
        "Entra ID admin Role": ["Admin"],
        "Entra ID viewer Role": ["Viewer"],
        "Entra ID Op Role": ["Op"],
 }
...
  class AzureCustomSecurity(FabAirflowSecurityManagerOverride,.....
............
 return {
            ................
                "role_keys": me.get("role_keys", ["airflow_public"])
            }

My observation is, inside the return block, even through "role_keys" is having Admin/Ops role values from Azure , its always selecting the role assigned for AUTH_USER_REGISTRATION_ROLE = "XXX" and the UI renders accordingly.

Log statements for me object is showing all correct values that are received from Azure.

Is there anything I am missing in this case ? My understanding is, whatever the role_keys inside the return block is set, the same role should be applied while rendering the UI.

@yriveiro
Copy link

@vdozal loading the AAD values from a secret with this values and it works:

webserver:
  webserverConfigConfigMapName: webserver-config-custom
  env:
    - name: AAD_TENANT_ID
      valueFrom:
        secretKeyRef:
          name: airflow-aad
          key: aad_tenant_id
    - name: AAD_CLIENT_ID
      valueFrom:
        secretKeyRef:
          name: airflow-aad
          key: aad_client_id
    - name: AAD_CLIENT_SECRET
      valueFrom:
        secretKeyRef:
          name: airflow-aad
          key: aad_client_secret

@yriveiro
Copy link

Does anyone know how I can get the REST API working as well?

My users will not have user and password, which doesn't allow me to use a basic auth provider. session backend probably works in the browser because it does some magic with the cookies, but I would like to call the REST API using curl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment