Created
June 6, 2018 17:03
-
-
Save coingraham/afcccfe2642f4de4d0fd123b39f98b81 to your computer and use it in GitHub Desktop.
Hello World for EMR Spark Step Python Pyspark
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from pyspark.context import SparkContext | |
| from pyspark.sql import SparkSession | |
| if __name__ == "__main__": | |
| # Create the spark session | |
| spark = SparkSession\ | |
| .builder\ | |
| .appName("SparkEMR")\ | |
| .getOrCreate() | |
| # Create the spark context | |
| sc = spark.sparkContext | |
| # Put your bucket and folder here | |
| s3_bucket = "s3://my-bucket-name/folder-name/hello_world/" | |
| # Create Hello World Dataframe | |
| dataframe = spark.createDataFrame([("Hello", "World")]) | |
| # Coalesce the data to 1 file | |
| # Format as CSV | |
| # Save the dataframe to your s3_bucket | |
| # Overwrite what's there | |
| dataframe.coalesce(1).write.csv(s3_bucket, mode="overwrite") | |
| # Clean up when done | |
| sc.stop() |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment