Skip to content

Instantly share code, notes, and snippets.

@saswata-dutta
Created November 22, 2020 14:32
Show Gist options
  • Save saswata-dutta/1c350da2664dc226ba0e5f0c5c565b0f to your computer and use it in GitHub Desktop.
Save saswata-dutta/1c350da2664dc226ba0e5f0c5c565b0f to your computer and use it in GitHub Desktop.
val df = spark.read.json("concats-head.json")
val df1 = df.withColumn("elements", explode($"contacts")).select($"acc", col("elements.contactEmail").as("emails"))
val emailDomains = udf((s: String) => s.split(",").flatMap(it => it.split(";")).map(_.split("@")(1)))
val df2 = df1.withColumn("domains", emailDomains($"emails")).withColumn("domain", explode($"domains")).select("acc", "domain")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment