A curated list of key Apache Spark experts and core contributors.
| Name | GitHub | Notable Contributions |
|---|---|---|
| Matei Zaharia | mateiz | Creator of Spark, MLflow, Delta Lake |
| Patrick Wendell | pwendell | Early committer, release manager |
| Reynold Xin | rxin | GraphX, Tungsten, Structured Streaming |
| Sean Owen | srowen | MLlib, Oryx, Spark evangelism |
| Jean-Georges “jgp” Perrin | jgperrin | “Spark in Action” author |
| Michael Armbrust | marmbrus | SQL engine, Catalyst, DataFrames |
| Hyukjin Kwon | HyukjinKwon | PySpark APIs, Koalas |
| Jason Dai | jason-dai | BigDL, MLlib contributions |
| Sandy Ryza | sryza | MLlib, job scheduling |
| Wenchen Fan | cloud-fan | Performance, SQL enhancements |
| Ram Sriharsha | harsha2010 | MLlib, runtime performance |
| Holden Karau | holdenk | spark-testing-base, books author |
These individuals have shaped Spark’s architecture, APIs, ML, SQL, streaming, performance, and education. You can find the full list, along with links to their other social media profiles and additional details, here: https://echoglobal.tech/technologies/apache-spark/