Created
June 18, 2018 11:12
-
-
Save YordanGeorgiev/334334368885eaf6761e83be396637ca to your computer and use it in GitHub Desktop.
[how-to call udf from df] how-to call udf per row in datafram with scala spark #scala #spark #df #udf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// this is the UDF - note the data type - should be a Java reflection compatible one !!! | |
val getUID = udf { (col1: String) => | |
{ | |
musa_wkt_eutrancellid match { | |
case null => UUID.randomUUID().toString | |
case _ => { | |
try { | |
broadCastedMap.value.get(col1).get.toString // obs return the same datatype as | |
} catch { | |
case e: java.util.NoSuchElementException => { | |
UUID.randomUUID().toString | |
} | |
case e: Exception => { | |
UUID.randomUUID().toString | |
} | |
} | |
} | |
} | |
} | |
} | |
// Action !!! | |
val newDf = df.withColumn("uid", getUID(col("col1").cast(sql.types.StringType))) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment