Created
October 5, 2018 17:31
-
-
Save ottomata/212befe63e7ea993645d3d675303df84 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val webrequest = spark.table("wmf.webrequest").where(...) | |
// tags is an array right? need to concat these somehow, but you get the point. | |
// Need just distinct list of tags | |
val tags = webrequest.select("tags").distinct | |
val tagGroupsDfs = tags.map(tag => (tag, webrequest.where(s"${tag} in tags")) ) | |
val partDfs = tagGroupDfs.map(t => { | |
tag = t._1 | |
df = t._2 | |
PartitionedDataFrame(df, HivePartition(year,month,day,hour,tag=tag)) | |
}) | |
partDfs.foreach { partDf => DataFrameToHive(..., partDf, ...) } | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment