Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save sakethramanujam/12151cd1b0b386197383acae50d5f79a to your computer and use it in GitHub Desktop.
Save sakethramanujam/12151cd1b0b386197383acae50d5f79a to your computer and use it in GitHub Desktop.
solution to a common trouble that airflow users might have faced.

⚠️ This post assumes that you have a basic understanding of the Airflow Web UI! ⚠️

Airflow Connecting a Local Database on Apache Airflow

What's the problem? 🤔

Once the airflow environment is setup, dags are written and the webserver is launched the ui would be something like this alt-ui So, here's my sample workflow dag that I've written. I've backfilled it for testing purposes and so it does display the task status and other metrics.

So, we have a quite good number of things that we can do with airflow nav. alt-nav

The Data profiling page/tab seemed to be a mysterious one for me for the reason that I could literally not do anything.

dag-profiling

Whenever I tried to run a query from the AdHoc Query , nothing seemed to happen. After quite some time of hustle I realized that there the db connection that is created as a default by the config file is 🐟y

Whenever we try to set an airflow environment up, we edit/generate a config file namely airflow.cfg that contains the details of all the environment variables. This is generally done by exporting the airflow home variable as

$ export AIRFLOW_HOME=~/user/../airflow/

After the airflow config is setup we do an

$ airflow initdb

which creates an airflow.db that acts as a meta data database for our workflow.

The connection to the airflow.db is managed by the sql_alchemy_conn variable in the airflow.cfg. FYI, all the connections can be seen in the Admin > Connection page as alt-connections

With this setup i.e., host as SQL, in the airflow_db row, you may end with something like the following when you try to do an AdHOC query queryfail-alt

⌚ for the solution

The problem is that the db hasn't been correctly pointed to, in the host parameter of airflow_db

The solution is that the connection needs to be setup the connection properly via a relative path/absolute path to the database as follows. connection-2-alt

Here my db is named airflow.db under the airflow home and relative path works fine.

Save it!

Time for a 🖊️ test

Now go back to Data Profiling>AdHoc Query and BOOM 🎆

BAZINGA!!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment