Skip to content

Instantly share code, notes, and snippets.

@ivangeorgiev
Last active September 13, 2019 17:56
Show Gist options
  • Save ivangeorgiev/2f9784ba0cf099441e27f995f6d13121 to your computer and use it in GitHub Desktop.
Save ivangeorgiev/2f9784ba0cf099441e27f995f6d13121 to your computer and use it in GitHub Desktop.
Creating Spark DataFrame
data = [
    ('1990-05-03', 29, True),
    ('1994-09-23', 25, False)
]

df = spark.createDataFrame(data, ['dob', 'age', 'is_fan'])
df.printSchema()
root
 |-- dob: string (nullable = true)
 |-- age: long (nullable = true)
 |-- is_fan: boolean (nullable = true)
df.show()
+----------+---+------+
|       dob|age|is_fan|
+----------+---+------+
|1990-05-03| 29|  true|
|1994-09-23| 25| false|
+----------+---+------+
data_list = [
('1990-05-03', 29, True),
('1994-09-23', 25, False)
]
data = [ {'dob': r[0], 'age': r[1], 'is_fan': r[2]} for r in data_list ]
user_df = spark.createDataFrame(data)
user_df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment