Skip to content

Instantly share code, notes, and snippets.

@hakanilter
Created September 25, 2018 20:45
Show Gist options
  • Save hakanilter/a9e58a305637e31b16824338f4c05e05 to your computer and use it in GitHub Desktop.
Save hakanilter/a9e58a305637e31b16824338f4c05e05 to your computer and use it in GitHub Desktop.
Load Avro files and extract json string as dataframe
import org.apache.spark.sql.functions.udf
import spark.implicits._
// read avro
val input = "/Users/hakanilter/dev/workspace/mc/data/avroFiles/*"
val data = spark.read
.format("com.databricks.spark.avro")
.option("header","true")
.load(input)
// convert to json
val convertString = udf((payload: Array[Byte]) => new String(payload))
val rdd = data.select("body").withColumn("body", convertString(data("body"))).rdd.map(_.getAs[String](0))
val df = spark.read.json(rdd)
df.printSchema
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment