Skip to content

Instantly share code, notes, and snippets.

@shravan-kuchkula
Last active September 12, 2022 07:27
Show Gist options
  • Save shravan-kuchkula/a3f357ff34cf5e3b862f3132fb599cf3 to your computer and use it in GitHub Desktop.
Save shravan-kuchkula/a3f357ff34cf5e3b862f3132fb599cf3 to your computer and use it in GitHub Desktop.
Install apache-airflow locally on mac

Using Docker and docker-compose to manage Apache Airflow on mac

Using our beloved docker and docker-compose, we can very quickly bring up an Apache Airflow instance on our mac.

Contents of docker-compose.yml

About the only thing you need to customize in this docker-compose.yml file is the volumes section. This will tell docker to map the given directory containing your Airflow DAGs/plugins to container file system.

version: '3'
services:
  postgres:
    image: postgres:9.6
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    ports:
      - "5432:5432"

  webserver:
    image: puckel/docker-airflow:1.10.4
    build:
      context: https://github.com/puckel/docker-airflow.git#1.10.4
      dockerfile: Dockerfile
      args:
        AIRFLOW_DEPS: gcp_api,s3
    restart: always
    depends_on:
      - postgres
    environment:
      - LOAD_EX=n
      - EXECUTOR=Local
      - FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
    volumes:
      - ./sparkify/dags:/usr/local/airflow/dags
      # Uncomment to include custom plugins
      - ./sparkify/plugins:/usr/local/airflow/plugins
    ports:
      - "8080:8080"
    command: webserver
    healthcheck:
      test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
      interval: 30s
      timeout: 30s
      retries: 3

How to start airflow?

Ans: Simply use docker-compose up. Assuming you have a docker-compose.yml file in the pwd.

Note: If you don't want to see logs, just use docker-compose up -d

(airflow) Shravan: airflow$ docker-compose up
Creating network "airflow_default" with the default driver
Creating airflow_postgres_1 ... done
Creating airflow_webserver_1 ... done
Attaching to airflow_postgres_1, airflow_webserver_1
postgres_1   | The files belonging to this database system will be owned by user "postgres".
postgres_1   | This user must also own the server process.
postgres_1   |
postgres_1   | The database cluster will be initialized with locale "en_US.utf8".
postgres_1   | The default database encoding has accordingly been set to "UTF8".
postgres_1   | The default text search configuration will be set to "english".
postgres_1   |
postgres_1   | Data page checksums are disabled.
postgres_1   |
postgres_1   | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres_1   | creating subdirectories ... ok
postgres_1   | selecting default max_connections ... 100
postgres_1   | selecting default shared_buffers ... 128MB
postgres_1   | selecting default timezone ... Etc/UTC
postgres_1   | selecting dynamic shared memory implementation ... posix
postgres_1   | creating configuration files ... ok
postgres_1   | running bootstrap script ... ok
webserver_1  | Wed Sep 25 15:37:46 UTC 2019 - waiting for Postgres... 1/20
postgres_1   | performing post-bootstrap initialization ... ok
postgres_1   | syncing data to disk ... ok
postgres_1   |
postgres_1   | Success. You can now start the database server using:
postgres_1   |
postgres_1   |     pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres_1   |
postgres_1   |
postgres_1   | WARNING: enabling "trust" authentication for local connections
postgres_1   | You can change this by editing pg_hba.conf or using the option -A, or
postgres_1   | --auth-local and --auth-host, the next time you run initdb.
postgres_1   | waiting for server to start....LOG:  database system was shut down at 2019-09-25 15:37:46 UTC
postgres_1   | LOG:  MultiXact member wraparound protections are now enabled
postgres_1   | LOG:  database system is ready to accept connections
postgres_1   | LOG:  autovacuum launcher started
postgres_1   |  done
postgres_1   | server started
postgres_1   | CREATE DATABASE
postgres_1   |
postgres_1   |
postgres_1   | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
postgres_1   |
postgres_1   | waiting for server to shut down...LOG:  received fast shutdown request
postgres_1   | .LOG:  aborting any active transactions
postgres_1   | LOG:  autovacuum launcher shutting down
postgres_1   | LOG:  shutting down
postgres_1   | LOG:  database system is shut down
postgres_1   |  done
postgres_1   | server stopped
postgres_1   |
postgres_1   | PostgreSQL init process complete; ready for start up.
postgres_1   |
postgres_1   | LOG:  database system was shut down at 2019-09-25 15:37:48 UTC
postgres_1   | LOG:  MultiXact member wraparound protections are now enabled
postgres_1   | LOG:  database system is ready to accept connections
postgres_1   | LOG:  autovacuum launcher started
postgres_1   | LOG:  incomplete startup packet
webserver_1  | [2019-09-25 15:37:51,853] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=10
webserver_1  | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1  |   """)
webserver_1  | [2019-09-25 15:37:52,136] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  | DB: postgresql+psycopg2://airflow:***@postgres:5432/airflow
webserver_1  | [2019-09-25 15:37:52,543] {{db.py:369}} INFO - Creating tables
webserver_1  | INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
webserver_1  | INFO  [alembic.runtime.migration] Will assume transactional DDL.
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade cc1e65623dc7 -> bdaa763e6c56, Make xcom value column a large binary
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade bdaa763e6c56 -> 947454bf1dff, add ti job_id index
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 947454bf1dff -> d2ae31099d61, Increase text size for MySQL (not relevant for other DBs' text types)
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 0e2a74e0fc9f, Add time zone awareness
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 33ae817a1ff4, kubernetes_resource_checkpointing
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 33ae817a1ff4 -> 27c6a30d7c24, kubernetes_resource_checkpointing
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 27c6a30d7c24 -> 86770d1215c0, add kubernetes scheduler uniqueness
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 86770d1215c0, 0e2a74e0fc9f -> 05f30312d566, merge heads
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 05f30312d566 -> f23433877c24, fix mysql not null constraint
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade f23433877c24 -> 856955da8476, fix sqlite foreign key
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 856955da8476 -> 9635ae0956e7, index-faskfail
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> dd25f486b8ea
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade dd25f486b8ea -> bf00311e1990, add index to taskinstance
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> 0a2a5b66e19d, add task_reschedule table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 0a2a5b66e19d, bf00311e1990 -> 03bc53e68815, merge_heads_2
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 03bc53e68815 -> 41f5f12752f8, add superuser field
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 41f5f12752f8 -> c8ffec048a3b, add fields to dag
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade c8ffec048a3b -> dd4ecb8fbee3, Add schedule interval to dag
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade dd4ecb8fbee3 -> 939bb1e647c8, task reschedule fk on cascade delete
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade c8ffec048a3b -> a56c9515abdc, Remove dag_stat table
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 6e96a59344a4, Make TaskInstance.pool not nullable
webserver_1  | INFO  [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 004c1210f153, increase queue name size limit
webserver_1  | Done.
webserver_1  | [2019-09-25 15:37:54,784] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1
webserver_1  | [2019-09-25 15:37:54,813] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=13
webserver_1  | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1  |   """)
webserver_1  | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1  |   """)
webserver_1  | [2019-09-25 15:37:55,104] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  | [2019-09-25 15:37:55,191] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  |   ____________       _____________
webserver_1  |  ____    |__( )_________  __/__  /________      __
webserver_1  | ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
webserver_1  | ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
webserver_1  |  _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
webserver_1  |   ____________       _____________
webserver_1  |  ____    |__( )_________  __/__  /________      __
webserver_1  | ____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
webserver_1  | ___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
webserver_1  |  _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
webserver_1  | [2019-09-25 15:37:55,684] {{scheduler_job.py:1288}} INFO - Starting the scheduler
webserver_1  | [2019-09-25 15:37:55,684] {{scheduler_job.py:1296}} INFO - Running execute loop for -1 seconds
webserver_1  | [2019-09-25 15:37:55,685] {{scheduler_job.py:1297}} INFO - Processing each file at most -1 times
webserver_1  | [2019-09-25 15:37:55,686] {{scheduler_job.py:1300}} INFO - Searching for files in /usr/local/airflow/dags
webserver_1  | [2019-09-25 15:37:55,723] {{scheduler_job.py:1302}} INFO - There are 1 files in /usr/local/airflow/dags
webserver_1  | [2019-09-25 15:37:55,740] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | [2019-09-25 15:37:55,981] {{scheduler_job.py:1349}} INFO - Resetting orphaned tasks for active dag runs
webserver_1  | [2019-09-25 15:37:56,020] {{dag_processing.py:543}} INFO - Launched DagFileProcessorManager with pid: 103
webserver_1  | Running the Gunicorn Server with:
webserver_1  | Workers: 4 sync
webserver_1  | Host: 0.0.0.0:8080
webserver_1  | Timeout: 120
webserver_1  | Logfiles: - -
webserver_1  | =================================================================
webserver_1  | [2019-09-25 15:37:56,094] {{settings.py:54}} INFO - Configured default timezone <Timezone [UTC]>
webserver_1  | [2019-09-25 15:37:56,121] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=103
webserver_1  | [2019-09-25 15:37:56,972] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=116
webserver_1  | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1  |   """)
webserver_1  | [2019-09-25 15:37:57 +0000] [116] [INFO] Starting gunicorn 19.9.0
webserver_1  | [2019-09-25 15:37:57 +0000] [116] [INFO] Listening at: http://0.0.0.0:8080 (116)
webserver_1  | [2019-09-25 15:37:57 +0000] [116] [INFO] Using worker: sync
webserver_1  | [2019-09-25 15:37:57 +0000] [167] [INFO] Booting worker with pid: 167
webserver_1  | [2019-09-25 15:37:57 +0000] [171] [INFO] Booting worker with pid: 171
webserver_1  | [2019-09-25 15:37:57,237] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  | [2019-09-25 15:37:57,288] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  | [2019-09-25 15:37:57 +0000] [174] [INFO] Booting worker with pid: 174
webserver_1  | [2019-09-25 15:37:57 +0000] [175] [INFO] Booting worker with pid: 175
webserver_1  | [2019-09-25 15:37:57,491] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  | [2019-09-25 15:37:57,544] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1  | [2019-09-25 15:37:57,927] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | [2019-09-25 15:37:58,020] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | [2019-09-25 15:37:58,363] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1  | [2019-09-25 15:37:58,483] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags

How to shutdown airflow containers?

Ans: This will just stop the containers, it won't remove them.

(airflow) Shravan: airflow$ docker-compose stop
Stopping airflow_webserver_1 ... done
Stopping airflow_postgres_1  ... done

How to shutdown airflow containers and remove the containers ?

Ans: When you make changes to docker-compose, and you wish to start fresh.

(airflow) Shravan: airflow$ docker-compose down
Removing airflow_webserver_1 ... done
Removing airflow_postgres_1  ... done
Removing network airflow_default

(airflow) Shravan: airflow$ docker-compose ps
Name   Command   State   Ports
------------------------------

(OPTIONAL) Create a new environment for airflow:

STEP 1: conda create --no-default-packages -n airflow python=3.6

STEP 2: conda activate airflow

STEP 3: pip install apache-airflow[postgres,s3] --no-cache-dir

The --no-cache-dir option will force pip to go and retrieve instead of being a lazy-bum and reading from cache.

@obar1
Copy link

obar1 commented Feb 8, 2022

typo pip install 'apache-airflow[postgres,s3]' --no-cache-dir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment