Skip to content

Instantly share code, notes, and snippets.

@64lines
Last active March 6, 2019 15:06
Show Gist options
  • Save 64lines/d2def8753c6e245734b65612b4718bd0 to your computer and use it in GitHub Desktop.
Save 64lines/d2def8753c6e245734b65612b4718bd0 to your computer and use it in GitHub Desktop.
[PYSPARK] - Create dataframe from scratch
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local").appName("rel subject").getOrCreate()
columns = ['id', 'referringUrl']
vals = [
(1, 'http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html'),
(2, 'https://stackoverflow.com/questions/33224740/best-way-to-get-the-max-value-in-a-spark-dataframe-column'),
(3, 'https://datascience.stackexchange.com/questions/11284/key-parameter-in-max-function-in-pyspark'),
]
df = spark.createDataFrame(vals, columns)
df.show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment