Skip to content

Instantly share code, notes, and snippets.

@dmateusp
Last active June 5, 2019 19:14
Show Gist options
  • Save dmateusp/aeae8d07a3964baa0fb089be9be21ffa to your computer and use it in GitHub Desktop.
Save dmateusp/aeae8d07a3964baa0fb089be9be21ffa to your computer and use it in GitHub Desktop.
DataFrame.transform - Spark Function Composition - Functions post refactor
def sumAmounts(by: Column*): DataFrame => DataFrame =
df => df.groupBy(by: _*).agg(sum(col("amount")))
def extractPayerBeneficiary(columnName: String): DataFrame => DataFrame =
df =>
df.withColumn(
s"${columnName}_payer",
regexp_extract(
col(columnName),
"paid by ([A-Z])",
1
)
).withColumn(
s"${columnName}_beneficiary",
regexp_extract(
col(columnName),
"to ([A-Z])",
1
)
)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment