Last active
December 21, 2015 12:58
-
-
Save shlomiv/6309305 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// load the entire file, and call it fs (all lazy) | |
val fs = sc.textFile("/data01/fs.txt") | |
// lets find all lines that contains the string "song", and cache that data source | |
val songs = fs.filter(x=>x.toLowerCase.contains("song")).cache | |
// now that we are trying to count, all the previous lazy computations will have to get realized, so this will take about 85 | |
// seconds to complete, but then it will be completly cached. | |
songs.count | |
// lets try that again, now after the cache | |
songs.count | |
// we now realize that we our previous predicate was to general, and included things like "songwriter" | |
// so say we still want sentences containing just the word "song". | |
val onlysongs = song.filter(x=>x.contains(" song ")) | |
// lets count. again, this will relize the lazy computation we just wanted, but this time it will take just a few seconds | |
onlysongs.count | |
// now finally, lets write this to the filesystem. this will take longer because of the io involved, around 30 seconds | |
songs.saveAsTextFile("/tmp/songs") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment