Using our beloved docker and docker-compose, we can very quickly bring up an Apache Airflow instance on our mac.
About the only thing you need to customize in this docker-compose.yml file is the volumes section. This will tell docker to map the given directory containing your Airflow DAGs/plugins to container file system.
version: '3'
services:
postgres:
image: postgres:9.6
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
image: puckel/docker-airflow:1.10.4
build:
context: https://github.com/puckel/docker-airflow.git#1.10.4
dockerfile: Dockerfile
args:
AIRFLOW_DEPS: gcp_api,s3
restart: always
depends_on:
- postgres
environment:
- LOAD_EX=n
- EXECUTOR=Local
- FERNET_KEY=jsDPRErfv8Z_eVTnGfF8ywd19j4pyqE3NpdUBA_oRTo=
volumes:
- ./sparkify/dags:/usr/local/airflow/dags
# Uncomment to include custom plugins
- ./sparkify/plugins:/usr/local/airflow/plugins
ports:
- "8080:8080"
command: webserver
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
Ans: Simply use docker-compose up
. Assuming you have a docker-compose.yml
file in the pwd.
Note: If you don't want to see logs, just use
docker-compose up -d
(airflow) Shravan: airflow$ docker-compose up
Creating network "airflow_default" with the default driver
Creating airflow_postgres_1 ... done
Creating airflow_webserver_1 ... done
Attaching to airflow_postgres_1, airflow_webserver_1
postgres_1 | The files belonging to this database system will be owned by user "postgres".
postgres_1 | This user must also own the server process.
postgres_1 |
postgres_1 | The database cluster will be initialized with locale "en_US.utf8".
postgres_1 | The default database encoding has accordingly been set to "UTF8".
postgres_1 | The default text search configuration will be set to "english".
postgres_1 |
postgres_1 | Data page checksums are disabled.
postgres_1 |
postgres_1 | fixing permissions on existing directory /var/lib/postgresql/data ... ok
postgres_1 | creating subdirectories ... ok
postgres_1 | selecting default max_connections ... 100
postgres_1 | selecting default shared_buffers ... 128MB
postgres_1 | selecting default timezone ... Etc/UTC
postgres_1 | selecting dynamic shared memory implementation ... posix
postgres_1 | creating configuration files ... ok
postgres_1 | running bootstrap script ... ok
webserver_1 | Wed Sep 25 15:37:46 UTC 2019 - waiting for Postgres... 1/20
postgres_1 | performing post-bootstrap initialization ... ok
postgres_1 | syncing data to disk ... ok
postgres_1 |
postgres_1 | Success. You can now start the database server using:
postgres_1 |
postgres_1 | pg_ctl -D /var/lib/postgresql/data -l logfile start
postgres_1 |
postgres_1 |
postgres_1 | WARNING: enabling "trust" authentication for local connections
postgres_1 | You can change this by editing pg_hba.conf or using the option -A, or
postgres_1 | --auth-local and --auth-host, the next time you run initdb.
postgres_1 | waiting for server to start....LOG: database system was shut down at 2019-09-25 15:37:46 UTC
postgres_1 | LOG: MultiXact member wraparound protections are now enabled
postgres_1 | LOG: database system is ready to accept connections
postgres_1 | LOG: autovacuum launcher started
postgres_1 | done
postgres_1 | server started
postgres_1 | CREATE DATABASE
postgres_1 |
postgres_1 |
postgres_1 | /usr/local/bin/docker-entrypoint.sh: ignoring /docker-entrypoint-initdb.d/*
postgres_1 |
postgres_1 | waiting for server to shut down...LOG: received fast shutdown request
postgres_1 | .LOG: aborting any active transactions
postgres_1 | LOG: autovacuum launcher shutting down
postgres_1 | LOG: shutting down
postgres_1 | LOG: database system is shut down
postgres_1 | done
postgres_1 | server stopped
postgres_1 |
postgres_1 | PostgreSQL init process complete; ready for start up.
postgres_1 |
postgres_1 | LOG: database system was shut down at 2019-09-25 15:37:48 UTC
postgres_1 | LOG: MultiXact member wraparound protections are now enabled
postgres_1 | LOG: database system is ready to accept connections
postgres_1 | LOG: autovacuum launcher started
postgres_1 | LOG: incomplete startup packet
webserver_1 | [2019-09-25 15:37:51,853] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=10
webserver_1 | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1 | """)
webserver_1 | [2019-09-25 15:37:52,136] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | DB: postgresql+psycopg2://airflow:***@postgres:5432/airflow
webserver_1 | [2019-09-25 15:37:52,543] {{db.py:369}} INFO - Creating tables
webserver_1 | INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
webserver_1 | INFO [alembic.runtime.migration] Will assume transactional DDL.
webserver_1 | INFO [alembic.runtime.migration] Running upgrade -> e3a246e0dc1, current schema
webserver_1 | INFO [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 52d714495f0 -> 502898887f84, Adding extra to Log
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 502898887f84 -> 1b38cef5b76e, add dagrun
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 1b38cef5b76e -> 2e541a1dcfed, task_duration
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 2e541a1dcfed -> 40e67319e3a9, dagrun_config
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 40e67319e3a9 -> 561833c1c74b, add password column to user
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 561833c1c74b -> 4446e08588, dagrun start end
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 4446e08588 -> bbc73705a13e, Add notification_sent column to sla_miss
webserver_1 | INFO [alembic.runtime.migration] Running upgrade bbc73705a13e -> bba5a7cfc896, Add a column to track the encryption state of the 'Extra' field in connection
webserver_1 | INFO [alembic.runtime.migration] Running upgrade bba5a7cfc896 -> 1968acfc09e3, add is_encrypted column to variable table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 1968acfc09e3 -> 2e82aab8ef20, rename user table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 2e82aab8ef20 -> 211e584da130, add TI state index
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 211e584da130 -> 64de9cddf6c9, add task fails journal table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 64de9cddf6c9 -> f2ca10b85618, add dag_stats table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade f2ca10b85618 -> 4addfa1236f1, Add fractional seconds to mysql tables
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 4addfa1236f1 -> 8504051e801b, xcom dag task indices
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 8504051e801b -> 5e7d17757c7a, add pid field to TaskInstance
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 5e7d17757c7a -> 127d2bf2dfa7, Add dag_id/state index on dag_run table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 127d2bf2dfa7 -> cc1e65623dc7, add max tries column to task instance
webserver_1 | INFO [alembic.runtime.migration] Running upgrade cc1e65623dc7 -> bdaa763e6c56, Make xcom value column a large binary
webserver_1 | INFO [alembic.runtime.migration] Running upgrade bdaa763e6c56 -> 947454bf1dff, add ti job_id index
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 947454bf1dff -> d2ae31099d61, Increase text size for MySQL (not relevant for other DBs' text types)
webserver_1 | INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 0e2a74e0fc9f, Add time zone awareness
webserver_1 | INFO [alembic.runtime.migration] Running upgrade d2ae31099d61 -> 33ae817a1ff4, kubernetes_resource_checkpointing
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 33ae817a1ff4 -> 27c6a30d7c24, kubernetes_resource_checkpointing
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 27c6a30d7c24 -> 86770d1215c0, add kubernetes scheduler uniqueness
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 86770d1215c0, 0e2a74e0fc9f -> 05f30312d566, merge heads
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 05f30312d566 -> f23433877c24, fix mysql not null constraint
webserver_1 | INFO [alembic.runtime.migration] Running upgrade f23433877c24 -> 856955da8476, fix sqlite foreign key
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 856955da8476 -> 9635ae0956e7, index-faskfail
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> dd25f486b8ea
webserver_1 | INFO [alembic.runtime.migration] Running upgrade dd25f486b8ea -> bf00311e1990, add index to taskinstance
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 9635ae0956e7 -> 0a2a5b66e19d, add task_reschedule table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 0a2a5b66e19d, bf00311e1990 -> 03bc53e68815, merge_heads_2
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 03bc53e68815 -> 41f5f12752f8, add superuser field
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 41f5f12752f8 -> c8ffec048a3b, add fields to dag
webserver_1 | INFO [alembic.runtime.migration] Running upgrade c8ffec048a3b -> dd4ecb8fbee3, Add schedule interval to dag
webserver_1 | INFO [alembic.runtime.migration] Running upgrade dd4ecb8fbee3 -> 939bb1e647c8, task reschedule fk on cascade delete
webserver_1 | INFO [alembic.runtime.migration] Running upgrade c8ffec048a3b -> a56c9515abdc, Remove dag_stat table
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 6e96a59344a4, Make TaskInstance.pool not nullable
webserver_1 | INFO [alembic.runtime.migration] Running upgrade 939bb1e647c8 -> 004c1210f153, increase queue name size limit
webserver_1 | Done.
webserver_1 | [2019-09-25 15:37:54,784] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=1
webserver_1 | [2019-09-25 15:37:54,813] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=13
webserver_1 | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1 | """)
webserver_1 | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1 | """)
webserver_1 | [2019-09-25 15:37:55,104] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | [2019-09-25 15:37:55,191] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | ____________ _____________
webserver_1 | ____ |__( )_________ __/__ /________ __
webserver_1 | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
webserver_1 | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
webserver_1 | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
webserver_1 | ____________ _____________
webserver_1 | ____ |__( )_________ __/__ /________ __
webserver_1 | ____ /| |_ /__ ___/_ /_ __ /_ __ \_ | /| / /
webserver_1 | ___ ___ | / _ / _ __/ _ / / /_/ /_ |/ |/ /
webserver_1 | _/_/ |_/_/ /_/ /_/ /_/ \____/____/|__/
webserver_1 | [2019-09-25 15:37:55,684] {{scheduler_job.py:1288}} INFO - Starting the scheduler
webserver_1 | [2019-09-25 15:37:55,684] {{scheduler_job.py:1296}} INFO - Running execute loop for -1 seconds
webserver_1 | [2019-09-25 15:37:55,685] {{scheduler_job.py:1297}} INFO - Processing each file at most -1 times
webserver_1 | [2019-09-25 15:37:55,686] {{scheduler_job.py:1300}} INFO - Searching for files in /usr/local/airflow/dags
webserver_1 | [2019-09-25 15:37:55,723] {{scheduler_job.py:1302}} INFO - There are 1 files in /usr/local/airflow/dags
webserver_1 | [2019-09-25 15:37:55,740] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1 | [2019-09-25 15:37:55,981] {{scheduler_job.py:1349}} INFO - Resetting orphaned tasks for active dag runs
webserver_1 | [2019-09-25 15:37:56,020] {{dag_processing.py:543}} INFO - Launched DagFileProcessorManager with pid: 103
webserver_1 | Running the Gunicorn Server with:
webserver_1 | Workers: 4 sync
webserver_1 | Host: 0.0.0.0:8080
webserver_1 | Timeout: 120
webserver_1 | Logfiles: - -
webserver_1 | =================================================================
webserver_1 | [2019-09-25 15:37:56,094] {{settings.py:54}} INFO - Configured default timezone <Timezone [UTC]>
webserver_1 | [2019-09-25 15:37:56,121] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=103
webserver_1 | [2019-09-25 15:37:56,972] {{settings.py:213}} INFO - settings.configure_orm(): Using pool settings. pool_size=5, max_overflow=10, pool_recycle=1800, pid=116
webserver_1 | /usr/local/lib/python3.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
webserver_1 | """)
webserver_1 | [2019-09-25 15:37:57 +0000] [116] [INFO] Starting gunicorn 19.9.0
webserver_1 | [2019-09-25 15:37:57 +0000] [116] [INFO] Listening at: http://0.0.0.0:8080 (116)
webserver_1 | [2019-09-25 15:37:57 +0000] [116] [INFO] Using worker: sync
webserver_1 | [2019-09-25 15:37:57 +0000] [167] [INFO] Booting worker with pid: 167
webserver_1 | [2019-09-25 15:37:57 +0000] [171] [INFO] Booting worker with pid: 171
webserver_1 | [2019-09-25 15:37:57,237] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | [2019-09-25 15:37:57,288] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | [2019-09-25 15:37:57 +0000] [174] [INFO] Booting worker with pid: 174
webserver_1 | [2019-09-25 15:37:57 +0000] [175] [INFO] Booting worker with pid: 175
webserver_1 | [2019-09-25 15:37:57,491] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | [2019-09-25 15:37:57,544] {{__init__.py:51}} INFO - Using executor LocalExecutor
webserver_1 | [2019-09-25 15:37:57,927] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1 | [2019-09-25 15:37:58,020] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1 | [2019-09-25 15:37:58,363] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
webserver_1 | [2019-09-25 15:37:58,483] {{dagbag.py:90}} INFO - Filling up the DagBag from /usr/local/airflow/dags
Ans: This will just stop the containers, it won't remove them.
(airflow) Shravan: airflow$ docker-compose stop
Stopping airflow_webserver_1 ... done
Stopping airflow_postgres_1 ... done
Ans: When you make changes to docker-compose, and you wish to start fresh.
(airflow) Shravan: airflow$ docker-compose down
Removing airflow_webserver_1 ... done
Removing airflow_postgres_1 ... done
Removing network airflow_default
(airflow) Shravan: airflow$ docker-compose ps
Name Command State Ports
------------------------------
STEP 1: conda create --no-default-packages -n airflow python=3.6
STEP 2: conda activate airflow
STEP 3: pip install apache-airflow[postgres,s3] --no-cache-dir
The --no-cache-dir option will force pip to go and retrieve instead of being a lazy-bum and reading from cache.
typo
pip install 'apache-airflow[postgres,s3]' --no-cache-dir