Skip to content

Instantly share code, notes, and snippets.

val orderItems = sc.textFile("/public/retail_db/order_items")
val orderItemsMap = orderItems.
map(oi => (oi.split(",")(1).toInt, oi.split(",")(4).toFloat))
orderItemsMap.
take(10).
foreach(println)
val orders = sc.textFile("/public/retail_db/orders")
val ordersMap = orders.
map(o => (o.split(",")(1), 1))
ordersMap.
reduceByKeyLocally((agg, ele) => agg + ele).
take(10).
foreach(println)
val orders = sc.textFile("/public/retail_db/orders")
val ordersMap = orders.
map(o => (o.split(",")(1), 1))
ordersMap.
reduceByKey((agg, ele) => agg + ele).
take(10).
foreach(println)
val orders = sc.textFile("/public/retail_db/orders")
val ordersMap = orders.
map(o => (o.split(",")(1), 1))
ordersMap.
countByKey.
take(10).
foreach(println)
val orders = sc.textFile("/public/retail_db/orders")
val ordersMap = orders.
map(o => (o.split(",")(1), 1))
val ordersGroupByDate = ordersMap.
groupByKey
val orderCountByDate = ordersGroupByDate.
map(o => (o._1, o._2.size))
orderCountByDate.
val lines = sc.textFile("/user/training/wordcountinput.txt")
val words = lines.flatMap(line => line.split(" "))
val wordTuples = words.map(word => (word, 1))
Let us perform word count to understand
spark in detail. As part of word count
we will try to get how many times each
word is repeated. To get word count we
will use flatMap, map and then either
countByKey or reduceByKey to get count
by each word.
val orders = sc.textFile("/public/retail_db/orders")
orders.
map(o => o.split(",")(1)).
take(10).
foreach(println)
orders.
map(o => o.split(",")(1)).
distinct.
val orders = sc.textFile("/public/retail_db/orders")
// Get COMPLETE orders
orders.
filter(o => o.split(",")(3) == "COMPLETE").
take(10).
foreach(println)
val orders = sc.textFile("/public/retail_db/orders")
val orderItems = sc.textFile("/public/retail_db/order_items")
val ordersCompleted = orders.
filter(o => List("COMPLETE", "CLOSED").contains(o.split(",")(3)))
val ordersCompletedMap = ordersCompleted.
map(o =>(o.split(",")(0).toInt, o.split(",")(1)))
val orderItemsMap = orderItems.