Skip to content

Instantly share code, notes, and snippets.

@abhi-io
Created March 9, 2025 08:07
Show Gist options
  • Save abhi-io/bbf5e7995320810bd95561b12071bb21 to your computer and use it in GitHub Desktop.
Save abhi-io/bbf5e7995320810bd95561b12071bb21 to your computer and use it in GitHub Desktop.
del_test2_docker-compose
DevOps Exercise: Scalable Data Pipeline with AWS, Prometheus, Grafana, and SFTP
Objective
Deploy a data ingestion and processing pipeline using Docker Compose, AWS services, and monitoring tools. The pipeline should securely transfer data from an SFTP server, process it, store it in an AWS S3 bucket, and provide real-time monitoring of performance and health. This exercise tests your understanding of containerization, cloud services, monitoring, and secure data handling.
Scenario
You are tasked with building a data pipeline that ingests log files from an external SFTP server, processes them using a custom Python script, and stores the processed data in an AWS S3 bucket. The pipeline needs to be scalable, monitored, and secure.
The pipeline comprises:
1. SFTP Ingestor (Python/Docker): Downloads log files from an SFTP server.
2. Data Processor (Python/Docker): Processes the downloaded log files.
3. AWS S3 Storage: Stores the processed data.
4. Prometheus: Collects metrics from the pipeline.
5. Grafana: Visualizes the collected metrics.
Requirements
1. SFTP Ingestor:
* Develop a Python script to securely connect to an SFTP server and download log files.
* Use Docker to containerize the script.
* Implement error handling and logging.
* Securely store SFTP credentials using Docker secrets.
2. Data Processor:
* Develop a Python script to process the log files (e.g., parse, filter, transform).
* Use Docker to containerize the script.
* Implement error handling and logging.
* Configure the data processor to upload the processed data to a designated AWS S3 bucket.
* Use AWS CLI within the container to upload to S3, using IAM roles or AWS credentials securely stored.
3. AWS S3 Storage:
* Create an AWS S3 bucket to store the processed data.
* Configure appropriate bucket policies.
* Use AWS CLI commands to create the needed S3 bucket.
4. Monitoring (Prometheus and Grafana):
* Configure Prometheus to scrape metrics from the SFTP Ingestor and Data Processor containers (e.g., file download/processing times, error rates).
* Create a Grafana dashboard to visualize the collected metrics.
* Use Docker Compose to deploy both prometheus and grafana.
5. Secure Data Transfer:
* Use SFTP for secure file transfer.
* Securely manage SFTP and AWS credentials using Docker secrets.
* Use IAM roles or securely stored AWS credentials for S3 interaction.
6. AWS CLI Usage:
* Use the AWS CLI to create the S3 bucket:
```bash
aws s3api create-bucket --bucket your-unique-bucket-name --region your-region
```
* Use the AWS CLI within the data processor container to upload the files.
Challenge
1. Security:
* Securely manage SFTP credentials and AWS access keys using Docker secrets and IAM roles.
* Ensure that only necessary ports are exposed.
* Ensure that the AWS credentials are not stored in the docker image.
2. Monitoring and Alerting:
* Implement comprehensive monitoring using Prometheus and Grafana.
* Create alerts for critical events (e.g., file download failures, high processing times).
* Monitor the size of the S3 bucket.
3. Scalability:
* Design the pipeline to be scalable by allowing multiple Data Processor containers to run concurrently.
* Test the pipeline with a large volume of data to simulate heavy load.
4. Error Handling and Recovery:
* Implement robust error handling in the Python scripts.
* Ensure that the pipeline can recover from transient errors (e.g., network issues, SFTP server downtime).
* Implement retry logic for SFTP and S3 operations.
5. Documentation:
* Provide a README.md file that includes:
* Setup instructions.
* Configuration details.
* Monitoring and troubleshooting guide.
* Instructions on how to use AWS CLI commands.
* Instructions on how to setup the docker secrets.
* Instructions on how to setup the grafana dashboard.
Submission Requirements
1. A working docker-compose.yml file and supporting files (Dockerfile, Python scripts, Prometheus configuration, Grafana dashboard JSON) pushed to a GitHub repository.
2. A script or instructions to create the necessary AWS S3 bucket using the AWS CLI.
3. Clear documentation (README.md) explaining the setup, monitoring, and security procedures, including details on Docker secrets management and AWS CLI usage.
4. A Grafana dashboard JSON file.
5. A script that simulates a large amount of logs being added to the SFTP server.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment