- Blueprint for your workflow
- Tasks are executed in a specific order
-
Operator are the Functions that does the work. Some sample airflow operators are below
- bashOperator
- pythonoperator
- emailOperator
- readFromPostgresSQL
-
Executors - how to execute your workflow
- Seq Executor
- Local Executor
- celery Executor - Running in parallel
- Managing the Underlying Infrastructure
- Managing access to the airflow instance using AWS IAM
- Scaling the instances
- Upgrades and Patches
- Encryption - Data encrypted is automatically enabled and encrypted using AWS KMS
- Monitoring
- Integration with other AWS Services -
- MWAA works based on Operators - MWAA provides built in operators for Athena, Batch, CW, Dynamo and for various other services.
Directed Acyclic Graphs (DAGs) will be utilized for orchestrating data workflows efficiently, with task sequences and dependencies defined. These DAGs will be uploaded to Amazon S3 for accessibility and version control. Subsequently, the pipelines will be scheduled and executed by Amazon Managed Workflows for Apache Airflow (MWAA), with data pulled from various sources, processed, and stored. This approach ensures a streamlined and reliable data orchestration process, leveraging cloud-native services for scalability and efficiency.
- AWS provides OOB CloudFormation templates to create MWAA and the underlying components like VPC where MWAA will be deployed.
-
Environment Class
- mw1.small
- mw1.medium
- mw1.large
-
Max worker count
- 1 - 25 permitted values
Note - If you need more workers - more environments can be easily spun up. This limit is per environment
The below logs can be shipped to cloudwatch automatically when enabled
- Airflow task logs
- Airflow scheduler logs
- Airflow worker logs
- Airflow DAG processing logs