Skip to content

Instantly share code, notes, and snippets.

@zouzias
Last active March 28, 2017 11:27
Show Gist options
  • Save zouzias/c53877f68a57296e79737ca6d290baba to your computer and use it in GitHub Desktop.
Save zouzias/c53877f68a57296e79737ca6d290baba to your computer and use it in GitHub Desktop.
LuceneRDD Question, run with `bin/spark-shell --packages org.zouzias:spark-lucenerdd_2.11:0.2.7`
(123ABC23,123ABC23QQ,AA-123ABC23-XYZ,123ABC23XYZ,AA-123ABC23AA)
(123XYZAA,)
(56789XY,)
import org.zouzias.spark.lucenerdd._
import org.zouzias.spark.lucenerdd.LuceneRDD
val leftArray = Array("123ABC23", "123XYZAA", "56789XY")
val left = spark.sparkContext.parallelize(leftArray)
val rightArray= Array("AA-123ABC23AA", "123ABC23XYZ", "AA-123ABC23-XYZ", "123ABC23QQ")
val right = spark.sparkContext.parallelize(rightArray)
val lucene = LuceneRDD(right)
def prefixLinker(s: String): String = {
s"_1:${s}*"
}
val linked = lucene.link(left, prefixLinker, 10)
linked.collect().map(x => (x._1, x._2.flatMap(_.doc.textField("_1")).mkString(","))).foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment