Skip to content

Instantly share code, notes, and snippets.

@YordanGeorgiev
Created February 17, 2018 10:40
Show Gist options
  • Save YordanGeorgiev/2e9ef47fe1487f7bb51306274d1a6988 to your computer and use it in GitHub Desktop.
Save YordanGeorgiev/2e9ef47fe1487f7bb51306274d1a6988 to your computer and use it in GitHub Desktop.
[dataframe pipeline for spark] how-to build a dataframe processing pipeline in scala spark #scala #spark #dataframe #control-flow
private def runPipeLine(cnf: Configuration): DataFrame = {
val dfOut: DataFrame =
new Phase1(cnf).process()
.transform(new Phase2(cnf).process)
return dfOut
}
class Phase1 extends DataFrameStage {
override def process(dfIn: DataFrame = None): DataFrame = {
val dfOut: DataFrame = dfIn.doSomeTransformations()
return dfOut
}
}
class Phase2 extends DataFrameStage {
override def process(dfIn: DataFrame): DataFrame = {
val dfOut: DataFrame = dfIn.doSomeTransformations()
return dfOut
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment