Skip to content

Instantly share code, notes, and snippets.

@lreverchuk
Created August 14, 2025 16:17
Show Gist options
  • Save lreverchuk/6b0ed264a03e153497bd7352abe83a23 to your computer and use it in GitHub Desktop.
Save lreverchuk/6b0ed264a03e153497bd7352abe83a23 to your computer and use it in GitHub Desktop.
Top Apache Spark Developers on GitHub

A curated list of key Apache Spark experts and core contributors.

Name GitHub Notable Contributions
Matei Zaharia mateiz Creator of Spark, MLflow, Delta Lake
Patrick Wendell pwendell Early committer, release manager
Reynold Xin rxin GraphX, Tungsten, Structured Streaming
Sean Owen srowen MLlib, Oryx, Spark evangelism
Jean-Georges “jgp” Perrin jgperrin “Spark in Action” author
Michael Armbrust marmbrus SQL engine, Catalyst, DataFrames
Hyukjin Kwon HyukjinKwon PySpark APIs, Koalas
Jason Dai jason-dai BigDL, MLlib contributions
Sandy Ryza sryza MLlib, job scheduling
Wenchen Fan cloud-fan Performance, SQL enhancements
Ram Sriharsha harsha2010 MLlib, runtime performance
Holden Karau holdenk spark-testing-base, books author

These individuals have shaped Spark’s architecture, APIs, ML, SQL, streaming, performance, and education. You can find the full list, along with links to their other social media profiles and additional details, here: https://echoglobal.tech/technologies/apache-spark/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment