Skip to content

Instantly share code, notes, and snippets.

@gavinwhyte
Last active October 17, 2021 09:07
Show Gist options
  • Save gavinwhyte/4963b76793fe96cf1a71373408009084 to your computer and use it in GitHub Desktop.
Save gavinwhyte/4963b76793fe96cf1a71373408009084 to your computer and use it in GitHub Desktop.
Airflow

Celery documentation of airflow

How to scale airflow

Install Airflow

pip install 'apache-airflow[celery]'

sudo apt update

apt install redis-server

  • Edit the following file

vi /etc/redis/redis.conf

  • Changed the following

"supervised no"

  • to

"supervised systemd"

  • save and close the file

  • Restart Redis

sudo systemctl restart redis.service

  • Now to check the status of redis

sudo systemctl staus redis.service

  • Configure airflow.cfg

  • Look for Executor and change the executor to Celery Executor

"executor = CeleryExecutor"

  • Change the the following in airflow.cfg, dont forget to setup postgresql

"sql_alchemy_conn = postgresql+psycopg2://airflow_user:airflow_pass@localhost/airflow_db"

  • Celery parametes to change in the airflow.cfg file
  • Pushes the message to the redis server so the line below is the connection to redis instance, and the zero below means the name of the database is 0

"broker_url = redis://localhost:6379/0"

  • Change the following parameter stores the metadata every time a task is executed.

"result_backend = db+postgresql://airflow_user:airflow_pass@localhost/airflow_db"

  • Once this is done save and and close the file

  • Now install the redis package on the main console

pip install 'apache-airflow[redis]'

Database instaation

  • lookup sql connection

airflow config get-value core sql_alchemy_conn

  • Get the executor

airflow config get-value core executor

-update terminal

sudo apt update

sudo apt install postgresql

  • hit enter and then click on yes

sudo -u postgres psql

ALTER USER postgres PASSWORD 'postgres';

  • exit

\q

  • Install the Airflow postgresql package

pip install 'apache-airflow[postgres]'

  • Configure Airflow
  • Open

"sql_alchemy_conn"

  • Change it too

sql_alchemy_conn = postgresql+psycopg2://airflow_user:airflow_pass@localhost/airflow_db

  • Lets check if we can reach the database by using

airflow db check

  • it show say something like this

[2021-10-17 08:20:05,996] {db.py:783} INFO - Connection successful.

  • Which means we have succesfully configures it

  • Change the executor in airflow.cfg

"executor=CeleryExecutor"

  • Then we stop and start Airflow

  • A configure that could work based on resources

  • parallelism=32

  • dag_concurrency=16

  • max_active_runs_per_dag=16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment