Skip to content

Instantly share code, notes, and snippets.

@rupeshtr78
Last active November 4, 2020 02:12
Show Gist options
  • Save rupeshtr78/df4f7448f4ce45384ef9085bc4c1539b to your computer and use it in GitHub Desktop.
Save rupeshtr78/df4f7448f4ce45384ef9085bc4c1539b to your computer and use it in GitHub Desktop.
SparkRead All options in one
spark.read
.option("header", "true")
.option("mode", "FAILFAST") // failFast ,dropMalformed, permissive (default) During parsing the records
.option("inferSchema", "true")
.option("path", "path/to/file(s)")
.option("delimiter","||")
.option("recursiveFileLookup","true") //→ Recursive read all files from directory
.csv("data/retail-data/all/online-retail-dataset.csv")
.repartition(2)
.schema(sparkguide.myManualSchema)
.load()
.selectExpr("instr(Description, 'GLASS') >= 1 as is_glass")
.groupBy("is_glass")
.count()
.collect()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment