Skip to content

Instantly share code, notes, and snippets.

@heathermiller
Last active September 28, 2016 16:32
Show Gist options
  • Save heathermiller/8e13a78c20b47e1f145458e1ac5a6d6a to your computer and use it in GitHub Desktop.
Save heathermiller/8e13a78c20b47e1f145458e1ac5a6d6a to your computer and use it in GitHub Desktop.
Desc
My colleague Heather Miller and I have been discussing a new project that would
focus on increasing the reliability and performance of applications based on
Apache’s “Spark” engine for big data processing. The programming model we aim to
improve seeks to achieve parallelism via distribution, by transmitting
computations (closures) to a collection of sites where distributed data resides.
The work we have in mind would have two areas of focus: (i) design,
implementation, and evaluation of programming models that make this paradigm of
shipping computations to distributed data more robust and usable, and less error
prone (e.g., to avoid races, memory leaks, etc.) and (ii) design,
implementation, and evaluation of tools for analyzing and refactoring of Spark
applications for improved reliability and performance (via analyses and
refactorings that would target the new programming model). The work would be
done in the context of the Scala programming language.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment