From Software to Data Engineer

Data Engineer's Responsibilities (not all encompassing):

Why does data engineering exist? It exists as an answer to these questions from data analysts and scientists.

Essential skills:

Python (and/or R programming)
SQL (SQLZOO)
Basic Statistics
Data modeling (ETL/ELT)
Data cleaning
At Tuft & Needle Looker and Metabase (aka BI tool)
At Tuft & Needle AWS & Docker containers (Some type of cloud platform experience Google, Amazon, Microsoft, IBM)
- One of the hurdles in learning data engineering is setting up a distributed cluster to develop on. Amazon provides a free-tier which can be used to learn the distributed technologies, rather than just using your local system.

Nice to haves:

Most important books:

https://github.com/andkret/Cookbook
Big Data, the book from Apache Storm and Lambda Architecture creator, Nathan Marz. Our Fellows have found it really helpful and the first two chapters are available free online.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems: https://www.amazon.com/dp/B06XPJML5D/?coliid=I2YK48GJ1AXOIA&colid=2TQE1S60MQO4T&psc=0&ref_=lv_ov_lig_dp_it

Data Engineering Online video courses or MOOCs:

Data Science MOOCs (further education):

How to get into data engineering:

Look into AWS - Kinesis (Buffer), Processing Framework (Lambda), S3 and/or Dynamodb (storage), Amazon API Gateway
BI Tools - Tableu

Learning Path - Level 1:

Learning Path - Level 2:

Understand various data architectures (Real Time, Batch, Event Driven, etc.)
Learn one streaming platform and processing engine
Pick one cloud provider and master their native data engineering product
Focus on cloud data warehouses, cloud big data services and managed spark services
Create and deploy pipelines on cloud with cloud based CI/CD

Learning Path - Level 3:

Deep dive into data architectures and data modeling
Understand and build Cloud Native data architectures and sandboxes (containers and K8s)
Hybrid Cloud
Focus on data management and Data Security Architecture
Build platforms that can democratize data and accelerate analysis

ltrainpr/software_to_data_engineer.md