Last active
June 18, 2022 13:53
-
-
Save yzhong52/f81e929e5810271292bd08856e2f4512 to your computer and use it in GitHub Desktop.
Create Spark DataFrame From List[Any]
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Spark 2.1 | |
val spark = SparkSession.builder().master("local").getOrCreate() | |
// Given a list of mixture of strings in integers | |
val values = List("20030100013280", 1.0) | |
// Create `Row` from `Seq` | |
val row = Row.fromSeq(values) | |
// Create `RDD` from `Row` | |
val rdd = spark.sparkContext.makeRDD(List(row)) | |
// Create schema fields | |
val fields = List( | |
StructField("First Column", StringType, nullable = false), | |
StructField("Second Column", DoubleType, nullable = false) | |
) | |
// Create `DataFrame` | |
val dataFrame = spark.createDataFrame(rdd, StructType(fields)) | |
// Done! Yay! | |
dataFrame.show(1) | |
+--------------+-------------+ | |
| First Column|Second Column| | |
+--------------+-------------+ | |
|20030100013280| 1.0| | |
+--------------+-------------+ | |
Thanks for the code snippet. Helped a lot.
@yzhong52 If I wanted to save millions of rows, what is the best way to extend this?
@ChinmaySKulkarni suggest reading this:
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Had to import:
import org.apache.spark.sql._
import org.apache.spark.sql.types._
Thanks for this. Worked perfectly.