Skip to content

Instantly share code, notes, and snippets.

@geoHeil
Created December 3, 2017 22:31
Show Gist options
  • Save geoHeil/43adf37d2d461f1557b233f55bc913bc to your computer and use it in GitHub Desktop.
Save geoHeil/43adf37d2d461f1557b233f55bc913bc to your computer and use it in GitHub Desktop.
val f1: DataFrame = g.find("(a)-[e1]->(b)")
.withColumn("level", lit("f1"))
.withColumnRenamed("a", "src")
.withColumnRenamed("b", "dst")
.select("src", "dst", "level")
val f2: DataFrame = g.find("(a)-[e1]->(b);(b)-[e2]->(c)").withColumn("level", lit("f2"))
.withColumnRenamed("a", "src")
.withColumnRenamed("c", "dst")
.drop("b")
.select(f1.columns.map(col _): _*)
val f3: DataFrame = g.find("(a)-[e1]->(b);(b)-[e2]->(c);(c)-[e3]->(d)").withColumn("level", lit("f3"))
.withColumnRenamed("a", "src")
.withColumnRenamed("d", "dst")
.drop("b", "c")
.select(f1.columns.map(col _): _*)
val friendsMultipleLevels = f1
.union(f2)
.union(f3)
val fFraud = friendsMultipleLevels.groupBy('src, 'level).agg(avg($"dst.fraud") as "fraudulence")
fFraud
.groupBy("src")
.pivot("level")
.agg(max('fraudulence)) // type of aggregation not really relevant here ... as only a single value can show up
.withColumn("id", $"src.id")
.withColumn("name", $"src.name")
.withColumn("fraud_src", $"src.fraud")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment