Skip to content

Instantly share code, notes, and snippets.

@gowthamshankar99
Last active April 15, 2024 15:47
Show Gist options
  • Save gowthamshankar99/84bdabc6dd4ea36542b0efc9b5c8b18a to your computer and use it in GitHub Desktop.
Save gowthamshankar99/84bdabc6dd4ea36542b0efc9b5c8b18a to your computer and use it in GitHub Desktop.
Apache Airflow

Apache Airflow

How does Apache Airflow work ?

DAG - Directed Acyclic Graph

  • Blueprint for your workflow
  • Tasks are executed in a specific order

What is an Airflow Operator

  • Operator are the Functions that does the work. Some sample airflow operators are below

    • bashOperator
    • pythonoperator
    • emailOperator
    • readFromPostgresSQL
  • Executors - how to execute your workflow

    • Seq Executor
    • Local Executor
    • celery Executor - Running in parallel

Advantages of using MWAA

  • Managing the Underlying Infrastructure
  • Managing access to the airflow instance using AWS IAM
  • Scaling the instances
  • Upgrades and Patches
  • Encryption - Data encrypted is automatically enabled and encrypted using AWS KMS
  • Monitoring
  • Integration with other AWS Services -
    • MWAA works based on Operators - MWAA provides built in operators for Athena, Batch, CW, Dynamo and for various other services.

How does MWAA work ?

Directed Acyclic Graphs (DAGs) will be utilized for orchestrating data workflows efficiently, with task sequences and dependencies defined. These DAGs will be uploaded to Amazon S3 for accessibility and version control. Subsequently, the pipelines will be scheduled and executed by Amazon Managed Workflows for Apache Airflow (MWAA), with data pulled from various sources, processed, and stored. This approach ensures a streamlined and reliable data orchestration process, leveraging cloud-native services for scalability and efficiency.

How do we Deploy MWAA

  • AWS provides OOB CloudFormation templates to create MWAA and the underlying components like VPC where MWAA will be deployed.

VPC Architecture

VPC Architecture

Important Parameters to configure when deploying MWAA

  • Environment Class

    • mw1.small
    • mw1.medium
    • mw1.large
  • Max worker count

    • 1 - 25 permitted values

Note - If you need more workers - more environments can be easily spun up. This limit is per environment

Logging for MWAA

The below logs can be shipped to cloudwatch automatically when enabled

  • Airflow task logs
  • Airflow scheduler logs
  • Airflow worker logs
  • Airflow DAG processing logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment