Skip to content

Instantly share code, notes, and snippets.

@nmukerje
Created March 28, 2018 17:12
Show Gist options
  • Save nmukerje/d805c83cc56d5b1d3e0f93c594635968 to your computer and use it in GitHub Desktop.
Save nmukerje/d805c83cc56d5b1d3e0f93c594635968 to your computer and use it in GitHub Desktop.
Read multiline json from S3
data = sc.wholeTextFiles('s3://<bucket>/dataset249').map(lambda x:x[1])
print(data.collect())
df=spark.read.json(data)
df.printSchema()
df.count()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment