Last active
June 5, 2017 00:57
-
-
Save dgadiraju/0b04e320941f1b79d99fbd8af5845645 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| val orders = sc.textFile("/public/retail_db/orders") // On the lab accessing HDFS | |
| val orders = sc.textFile("/Users/itversity/Research/data/retail_db/orders") // Accessing locally on the PC | |
| // Change to valid path as per your preference. Make sure the directory orders exist in the path (locally or on HDFS) | |
| orders.take(10).foreach(println) | |
| val completedOrders = orders.filter(rec => rec.split(",")(3) == "COMPLETE") | |
| val pendingOrders = orders. | |
| filter(order => { | |
| val o = order.split(",") | |
| (o(3).contains("PENDING") || o(3) == "PROCESSING") && o(1).contains("2013-08") | |
| }) | |
| val orderDates = completedOrders.map(rec => (rec.split(",")(0).toInt, rec.split(",")(1))) | |
| val lines = Array("Hello World", | |
| "In this case we are trying to understand", | |
| "the purpose of flatMap", | |
| "flatMap is a function which will apply transformation", | |
| "if the transformation results in array, it will flatten out array as individual records", | |
| "let us also understand difference between map and flatMap", | |
| "in case of map, it take one record and return one record after applying transformation", | |
| "even if the transformation result in an array", | |
| "where as in case of flatMap, it might return one or more records", | |
| "if the transformation of 1 record result an array of 1000 records, ", | |
| "then flatMap returns 1000 records") | |
| val linesRDD = sc.parallelize(lines) | |
| val words = linesRDD.flatMap(rec => rec.split(" ")) | |
| words.collect().foreach(println) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment