Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save ottomata/212befe63e7ea993645d3d675303df84 to your computer and use it in GitHub Desktop.
Save ottomata/212befe63e7ea993645d3d675303df84 to your computer and use it in GitHub Desktop.
val webrequest = spark.table("wmf.webrequest").where(...)
// tags is an array right? need to concat these somehow, but you get the point.
// Need just distinct list of tags
val tags = webrequest.select("tags").distinct
val tagGroupsDfs = tags.map(tag => (tag, webrequest.where(s"${tag} in tags")) )
val partDfs = tagGroupDfs.map(t => {
tag = t._1
df = t._2
PartitionedDataFrame(df, HivePartition(year,month,day,hour,tag=tag))
})
partDfs.foreach { partDf => DataFrameToHive(..., partDf, ...) }
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment