Skip to content

Instantly share code, notes, and snippets.

@zygm0nt
Created June 10, 2015 06:44
Show Gist options
  • Select an option

  • Save zygm0nt/0de457435ec974322929 to your computer and use it in GitHub Desktop.

Select an option

Save zygm0nt/0de457435ec974322929 to your computer and use it in GitHub Desktop.
word similarity - editable distance (Reco)
import com.allegrogroup.reco.model.{MetaItemType, Item}
import com.allegrogroup.reco.recommender.decorator.EditDistance
import java.nio.file.{Paths, Files}
import java.nio.charset.StandardCharsets
val distance: (Iterable[Char], Iterable[Char]) => Int = EditDistance[Char]
//val items = sc.objectFile[Item]("/projects/reco_prod/data/items/service_account_id=ALG/v_date=2015-06-01/part-00087")
val items = sc.objectFile[Item]("/projects/reco_prod/data/items/service_account_id=ALG/v_date=2015-06-01/")
val groupedByMetaItem = items.groupBy(_.metaItemMap.getOrElse(MetaItemType.categoryAndPredicates, "unknown"))
val distancesForItems = groupedByMetaItem.map { case (metaItemName, itemCollection) =>
val distances = itemCollection.toList.take(10).combinations(2).toList.filter(_.size > 1).map { case Seq(wordA, wordB) =>
val d = distance(wordA.name.toLowerCase, wordB.name.toLowerCase)
(d, wordA.name.toLowerCase, wordB.name.toLowerCase)
}.sortBy(_._1).map { case (d, wordA, wordB) =>
s"$wordA <?> $wordB -> $d"
}
s"""
|[$metaItemName]
|${distances.mkString("\t", "\n", "")}
""".stripMargin
}
println(distancesForItems.takeSample(false, 10).mkString("\n"))
Files.write(Paths.get("distance-comparison.txt"), distancesForItems.takeSample(false, 1000).mkString("\n").getBytes(StandardCharsets.UTF_8))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment