Skip to content

Instantly share code, notes, and snippets.

@josep2
Last active March 9, 2017 23:35
Show Gist options
  • Save josep2/d5b7a87ad94a6cd18eff26d5211de5de to your computer and use it in GitHub Desktop.
Save josep2/d5b7a87ad94a6cd18eff26d5211de5de to your computer and use it in GitHub Desktop.
import edu.berkeley.cs.succinct._
val conf = new SparkConf().setAppName("Ranking Example")
val sc = new SparkContext(conf)
// A large file of raw hip hop lyrics ~ 100 GB
val hipHopRDD = sc.textFile("/hiphopcorpus").map(_.getBytes)
// Persist the data
val hipHopRDDPersisted = hipHopRDD.succcinct.persist()
// Search for all mentions
val countsLyrics = hipHopRDDPersisted.search("Birthdays were the worst days")
// 3 returns
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment