Skip to content

Instantly share code, notes, and snippets.

@anjijava16
Created July 10, 2021 21:29
Show Gist options
  • Save anjijava16/6027ee10a852885acff14a9bb6b3d349 to your computer and use it in GitHub Desktop.
Save anjijava16/6027ee10a852885acff14a9bb6b3d349 to your computer and use it in GitHub Desktop.
Azure:
ADF/Databricks with Spark- Ingestion framework
ADLS - Data Storage
ADB - Transformations
Data Flows/ Polybase - To load data to Warehouse
Synapse - Datawarehouse
Azure SQL - Metadata Storage
ADF - Orchestration
Logic Apps : Alerts/Email
Azure Devops : Code Deployment
AWS Solution:
EMR with Spark : Data Ingestion
S3 : Data Storage
EMR with Spark/ Glue : Transformations
Spark/Python Glue : To load data to Warehouse
Redshift : Datawarehouse
RDS : Metadata Storage
StepFunctions/Lambda : Orchestration
SNS/SES/SMTP: Email/Alerts
Code Commit n Code Deploy: Deployment
GCP Solution:
Dataproc with Spark: Data Ingestion
Storage: GCS
Transformation: Dataproc Spark
Datawarehouse: Bigquery
Metadata Storage: CloudSQL(mySQL)
Data Validation :DataPrep
Orchestration: Cloud Composer(Apache Airflow)
Logging/Audit: GCP Cloud Operations
Tools: CLoud Build & Cloud Tasks,GCR
Lanuage: Python/Java
Note :if Terdata as source Data Ingestion best option is : TDCH
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment