Created
September 25, 2018 20:45
-
-
Save hakanilter/a9e58a305637e31b16824338f4c05e05 to your computer and use it in GitHub Desktop.
Load Avro files and extract json string as dataframe
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.sql.functions.udf | |
import spark.implicits._ | |
// read avro | |
val input = "/Users/hakanilter/dev/workspace/mc/data/avroFiles/*" | |
val data = spark.read | |
.format("com.databricks.spark.avro") | |
.option("header","true") | |
.load(input) | |
// convert to json | |
val convertString = udf((payload: Array[Byte]) => new String(payload)) | |
val rdd = data.select("body").withColumn("body", convertString(data("body"))).rdd.map(_.getAs[String](0)) | |
val df = spark.read.json(rdd) | |
df.printSchema |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment