Skip to content

Instantly share code, notes, and snippets.

@dgadiraju
Last active May 20, 2017 15:32
Show Gist options
  • Select an option

  • Save dgadiraju/9dff307069cb950396526a40ed9b799e to your computer and use it in GitHub Desktop.

Select an option

Save dgadiraju/9dff307069cb950396526a40ed9b799e to your computer and use it in GitHub Desktop.
val path = "/public/retail_db" or val path = "/Users/itversity/Research/data/retail_db"
val orders201312 = sc.textFile(path + "/orders").
filter(order => order.split(",")(1).contains("2013-12")).
map(order => (order.split(",")(0).toInt, order.split(",")(1)))
val orderItems = sc.textFile(path + "/order_items").
map(rec => (rec.split(",")(1).toInt, rec.split(",")(2).toInt))
val distinctProducts201312 = orders201312.
join(orderItems).
map(order => order._2._2).
distinct
val orders201401 = sc.textFile(path + "/orders").
filter(order => order.split(",")(1).contains("2014-01")).
map(order => (order.split(",")(0).toInt, order.split(",")(1)))
val products201312 = orders201312.
join(orderItems).
map(order => order._2._2)
val products201401 = orders201401.
join(orderItems).
map(order => order._2._2)
products201312.union(products201401).count
products201312.union(products201401).distinct.count
products201312.intersection(products201401).count
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment