This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<h1>Cascading Sample Recommender</h1> | |
<p>The goal for this project is to create a sample application in <a href="http://www.cascading.org/">Cascading 2.0</a> which shows how to build a simple kind of <a href="http://en.wikipedia.org/wiki/Recommender_system">social recommender</a>.</p> | |
<h2>Build</h2> | |
<p>First, clone a copy of the source code from our GitHub repo at <a href="https://github.com/Cascading/SampleRecommender">https://github.com/Cascading/SampleRecommender</a></p> | |
<pre><code>git clone https://github.com/Cascading/SampleRecommender.git | |
</code></pre> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ java -version | |
java version "1.6.0_33" | |
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720) | |
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode) | |
bash-3.2$ hadoop -version | |
Warning: $HADOOP_HOME is deprecated. | |
java version "1.6.0_33" | |
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720) | |
Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03-424, mixed mode) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ cd cascading.samples/ | |
bash-3.2$ ls | |
build.gradle hadoop logparser settings.gradle | |
build.xml loganalysis sample.build.gradle wordcount | |
bash-3.2$ cd wordcount/ | |
bash-3.2$ ls | |
README.TXT build.gradle data src | |
bash-3.2$ java -version | |
java version "1.6.0_33" | |
Java(TM) SE Runtime Environment (build 1.6.0_33-b03-424-11M3720) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ java -version | |
java version "1.6.0_35" | |
Java(TM) SE Runtime Environment (build 1.6.0_35-b10-428-11M3811) | |
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01-428, mixed mode) | |
bash-3.2$ pig -version | |
Warning: $HADOOP_HOME is deprecated. | |
Apache Pig version 0.10.0 (r1328203) | |
compiled Apr 19 2012, 22:54:12 | |
bash-3.2$ cat src/scripts/wc.pig |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ gradle clean jar | |
:clean | |
:compileJava | |
:processResources UP-TO-DATE | |
:classes | |
:jar | |
BUILD SUCCESSFUL | |
Total time: 4.316 secs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# use git to load multitool (simplest as a ZIP) | |
# https://github.com/Cascading/cascading.multitool | |
# to save time, we'll skip the JAR compile/build... | |
# download the JAR file from: | |
# https://s3.amazonaws.com/ceteri-mapred/multitool.jar | |
# cd to your cascading.multitool download | |
bash-3.2$ rm -rf output | |
bash-3.2$ hadoop jar ./multitool.jar source=data/days.txt select=Tuesday sink=output/tuesday.txt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# use git to load ceteri-mapred (simplest as a ZIP) | |
# https://github.com/ceteri/ceteri-mapred | |
# cd to your ceteri-mapred download | |
Pacos-MacBook-Pro:ceteri-mapred ceteri$ ls | |
README doc graph.gephi src thresh.R | |
bin graph.csv msgs.tsv stopwords thresh.tsv | |
Pacos-MacBook-Pro:ceteri-mapred ceteri$ ls src/ | |
map_filter.py map_parse.py map_wc.py red_filter.py red_idf.py red_wc.py util_extract.py util_gephi.py util_walk.py | |
Pacos-MacBook-Pro:ceteri-mapred ceteri$ head README |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
; Paul Lam | |
; https://github.com/Quantisan/Impatient | |
(ns impatient.core | |
(:use [cascalog.api] | |
[cascalog.more-taps :only (hfs-delimited)]) | |
(:require [clojure.string :as s] | |
[cascalog.ops :as c]) | |
(:gen-class)) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ ls | |
README.md build.gradle cascading.pattern.ipr data model.log src | |
build cascading.pattern.iml cascading.pattern.iws dot output | |
bash-3.2$ more output/ | |
classify/ measure/ | |
bash-3.2$ more output/measure/ | |
output/measure/ is a directory | |
bash-3.2$ more output/measure/part-00000 | |
label score count | |
0 0 73 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import com.twitter.scalding._ | |
class Example3(args : Args) extends Job(args) { | |
Tsv(args("doc"), ('doc_id, 'text), skipHeader = true) | |
.read | |
.flatMap('text -> 'token) { text : String => text.split("[ \\[\\]\\(\\),.]") } | |
.mapTo('token -> 'token) { token : String => scrub(token) } | |
.filter('token) { token : String => token.length > 0 } | |
.groupBy('token) { _.size('count) } | |
.write(Tsv(args("wc"), writeHeader = true)) |