Once the airflow environment is setup, dags are written and the webserver is launched the ui would be something like this So, here's my sample workflow dag that I've written. I've backfilled it for testing purposes and so it does display the task status and other metrics.
So, we have a quite good number of things that we can do with airflow nav.
The Data profiling page/tab seemed to be a mysterious one for me for the reason that I could literally not do anything.
Whenever I tried to run a query from the AdHoc Query , nothing seemed to happen. After quite some time of hustle I realized that there the db connection that is created as a default by the config file is 🐟y
Whenever we try to set an airflow environment up, we edit/generate a config file namely airflow.cfg that contains the details of all the environment variables. This is generally done by exporting the airflow home variable as
$ export AIRFLOW_HOME=~/user/../airflow/
After the airflow config is setup we do an
$ airflow initdb
which creates an airflow.db that acts as a meta data database for our workflow.
The connection to the airflow.db is managed by the sql_alchemy_conn variable in the airflow.cfg.
FYI, all the connections can be seen in the Admin > Connection
page as
With this setup i.e., host as SQL
, in the airflow_db row, you may end with something like the following when you try to do an AdHOC query
The problem is that the db hasn't been correctly pointed to, in the host parameter of airflow_db
The solution is that the connection needs to be setup the connection properly via a relative path/absolute path to the database as follows.
Here my db is named airflow.db under the airflow home and relative path works fine.
Save it!
Now go back to Data Profiling>AdHoc Query
and BOOM 🎆