Skip to content

Instantly share code, notes, and snippets.

@kmizumar
Created July 1, 2016 18:09
Show Gist options
  • Select an option

  • Save kmizumar/fc9497c67e5671931aba52c89e67f039 to your computer and use it in GitHub Desktop.

Select an option

Save kmizumar/fc9497c67e5671931aba52c89e67f039 to your computer and use it in GitHub Desktop.
val src1 = sqlContext.parquetFile("output/1/tfpach/cp")
src1.orderBy("ac").write.partitionBy("ac").parquet("cp-ordered")
val src2 = sqlContext.parquetFile("output/1/tfpach/pf")
src2.orderBy("ac").write.partitionBy("ac").parquet("pf-ordered")
val cp = sqlContext.parquetFile("cp-ordered")
val pf = sqlContext.parquetFile("pf-ordered")
val start = System.nanoTime()
cp.join(pf, cp("ac")===pf("ac")).drop(pf.col("ac")).write.parquet("join-1")
val end = System.nanoTime()
println("Time elapsed: " + (end-start)/1000 + " microsecs")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment