Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active June 5, 2017 00:57
Show Gist options
  • Select an option

  • Save dgadiraju/0b04e320941f1b79d99fbd8af5845645 to your computer and use it in GitHub Desktop.

Select an option

Save dgadiraju/0b04e320941f1b79d99fbd8af5845645 to your computer and use it in GitHub Desktop.
val orders = sc.textFile("/public/retail_db/orders") // On the lab accessing HDFS
val orders = sc.textFile("/Users/itversity/Research/data/retail_db/orders") // Accessing locally on the PC
// Change to valid path as per your preference. Make sure the directory orders exist in the path (locally or on HDFS)
orders.take(10).foreach(println)
val completedOrders = orders.filter(rec => rec.split(",")(3) == "COMPLETE")
val pendingOrders = orders.
filter(order => {
val o = order.split(",")
(o(3).contains("PENDING") || o(3) == "PROCESSING") && o(1).contains("2013-08")
})
val orderDates = completedOrders.map(rec => (rec.split(",")(0).toInt, rec.split(",")(1)))
val lines = Array("Hello World",
"In this case we are trying to understand",
"the purpose of flatMap",
"flatMap is a function which will apply transformation",
"if the transformation results in array, it will flatten out array as individual records",
"let us also understand difference between map and flatMap",
"in case of map, it take one record and return one record after applying transformation",
"even if the transformation result in an array",
"where as in case of flatMap, it might return one or more records",
"if the transformation of 1 record result an array of 1000 records, ",
"then flatMap returns 1000 records")
val linesRDD = sc.parallelize(lines)
val words = linesRDD.flatMap(rec => rec.split(" "))
words.collect().foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment