Skip to content

Instantly share code, notes, and snippets.

@coingraham
Created June 6, 2018 17:03
Show Gist options
  • Select an option

  • Save coingraham/afcccfe2642f4de4d0fd123b39f98b81 to your computer and use it in GitHub Desktop.

Select an option

Save coingraham/afcccfe2642f4de4d0fd123b39f98b81 to your computer and use it in GitHub Desktop.
Hello World for EMR Spark Step Python Pyspark
from pyspark.context import SparkContext
from pyspark.sql import SparkSession
if __name__ == "__main__":
# Create the spark session
spark = SparkSession\
.builder\
.appName("SparkEMR")\
.getOrCreate()
# Create the spark context
sc = spark.sparkContext
# Put your bucket and folder here
s3_bucket = "s3://my-bucket-name/folder-name/hello_world/"
# Create Hello World Dataframe
dataframe = spark.createDataFrame([("Hello", "World")])
# Coalesce the data to 1 file
# Format as CSV
# Save the dataframe to your s3_bucket
# Overwrite what's there
dataframe.coalesce(1).write.csv(s3_bucket, mode="overwrite")
# Clean up when done
sc.stop()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment