Skip to content

Instantly share code, notes, and snippets.

@audhiaprilliant
Created December 13, 2020 09:10
Show Gist options
  • Select an option

  • Save audhiaprilliant/74c97c4ef546d4c4c8bc3686e125d0b2 to your computer and use it in GitHub Desktop.

Select an option

Save audhiaprilliant/74c97c4ef546d4c4c8bc3686e125d0b2 to your computer and use it in GitHub Desktop.
Apache Airflow as Job Orchestration
# Set default args
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2020, 5, 20),
'email': ['[email protected]'],
'email_on_failure': True,
'email_on_retry': False,
'retries': 3,
'retry_delay': timedelta(minutes=2)
}
# Set Schedule: Run pipeline once a day.
# Use cron to define exact time (UTC). Eg. 8:15 AM would be '15 08 * * *'
schedule_interval = '30 09 * * *'
# Define DAG: Set ID and assign default args and schedule interval
dag = DAG(
dag_id = 'scraping_data_covid19',
default_args = default_args,
schedule_interval = schedule_interval
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment