Skip to content

Instantly share code, notes, and snippets.

@nkthiebaut
Created June 28, 2019 18:44
Show Gist options
  • Save nkthiebaut/44069093af2f20ea46819aa16346b7a1 to your computer and use it in GitHub Desktop.
Save nkthiebaut/44069093af2f20ea46819aa16346b7a1 to your computer and use it in GitHub Desktop.
Instantiate a Spark Session, register a DataFrame, and query it (Spark 2.0+).
from pyspark import SparkConf
from pyspark.sql import SparkSession
from pyspark.sql import DataFrame as SparkDataFrame
sc = SparkConf()
sc.set("spark.driver.memory", "4g")
ss = SparkSession.builder.master("local[4]").config(conf=sc).getOrCreate()
df = ss.createDataFrame([(1, "kevin"), (2, "steph")], ["id", "name"])
df.createOrReplaceTempView("players")
ss.sql("SELECT * FROM players").show()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment