1οΈβ£ The FinnHub Streaming Data Pipeline
- https://github.com/RSKriegs/finnhub-streaming-data-pipeline
- π¬ The project is a streaming data pipeline based on Finnhub.io API/websocket real-time trading data.
- π» Kafka, Spark, Cassandra, Kubernetes, Grafana
2οΈβ£ Streamify
- https://github.com/ankurchavda/streamify
- π¬ The project will stream events generated from a fake music streaming service (like Spotify) and create a data pipeline that consumes the real-time data
- π» Kafka, Spark Streaming, dbt, Docker, Airflow, Terraform, GCP
3οΈβ£ Reddit ETL Pipeline
- https://github.com/ABZ-Aaron/Reddit-API-Pipeline
- π¬ A data pipeline to extract Reddit data from r/dataengineering and provides a Google Data Studio report
- π» AWS S3/Redshift, dbt, Airflow, Docker, Terraform
4οΈβ£ Audiophile End-To-End ELT Pipeline
- https://github.com/ris-tlp/audiophile-e2e-pipeline
- π¬ Pipeline that extracts data from Crinacle's Headphone and InEarMonitor databases and finalizes data for a Metabase Dashboard.
- π» AWS S3, Redshift, RDS, dbt, Airflow
5οΈβ£ Surfline Dashboard
- https://github.com/andrem8/surf_dash
- π¬ The pipeline collects data from the surfline API and exports a csv file to S3. Then the most recent file in S3 is downloaded to be ingested into the Postgres datawarehouse. At the end, you obtain a beautiful dashboard showing the data
- π» AWS S3, Airflow, Pandas, Postgres, Ploty