Skip to content

Instantly share code, notes, and snippets.

@nathanmarz
nathanmarz / gist:2506020
Created April 27, 2012 05:03
Concise inline operations with Cascalog serfn branch
(use 'cascalog.playground) (bootstrap)
(require '[clojure.set :as set])
(require '[cascalog.vars :as v])
(defn all-syms [form]
(if (coll? form)
(reduce (fn [curr elem] (set/union curr (all-syms elem)))
#{}
form)
@nathanmarz
nathanmarz / gist:3234617
Created August 2, 2012 06:51
Batch spout declaration
FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3,
new Values("the cow jumped over the moon"),
new Values("the man went to the store and bought some candy"),
new Values("four score and seven years ago"),
new Values("how many apples can you eat"),
spout.setCycle(true);
@nathanmarz
nathanmarz / gist:3234621
Created August 2, 2012 06:53
Word count computation
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))
.parallelismHint(6);
@nathanmarz
nathanmarz / gist:3234623
Created August 2, 2012 06:54
Split function in Trident
public class Split extends BaseFunction {
public void execute(TridentTuple tuple, TridentCollector collector) {
String sentence = tuple.getString(0);
for(String word: sentence.split(" ")) {
collector.emit(new Values(word));
}
}
}
@nathanmarz
nathanmarz / gist:3234632
Created August 2, 2012 06:55
Using memcached for state
.persistentAggregate(MemcachedState.transactional(serverLocations), new Count(), new Fields("count"))
@nathanmarz
nathanmarz / gist:3234636
Created August 2, 2012 06:56
DRPC execute
DRPCClient client = new DRPCClient("drpc.server.location", 3772);
System.out.println(client.execute("words", "cat dog the man");
// prints the JSON-encoded result, e.g.: "[[5078]]"
@nathanmarz
nathanmarz / gist:3234638
Created August 2, 2012 06:57
Words DRPC implementation
topology.newDRPCStream("words")
.each(new Fields("args"), new Split(), new Fields("word"))
.groupBy(new Fields("word"))
.stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))
.each(new Fields("count"), new FilterNull())
.aggregate(new Fields("count"), new Sum(), new Fields("sum"));
@nathanmarz
nathanmarz / gist:3234642
Created August 2, 2012 06:59
Trident reach
TridentState urlToTweeters =
topology.newStaticState(getUrlToTweetersState());
TridentState tweetersToFollowers =
topology.newStaticState(getTweeterToFollowersState());
topology.newDRPCStream("reach")
.stateQuery(urlToTweeters, new Fields("args"), new MapGet(), new Fields("tweeters"))
.each(new Fields("tweeters"), new ExpandList(), new Fields("tweeter"))
.shuffle()
.stateQuery(tweetersToFollowers, new Fields("tweeter"), new MapGet(), new Fields("followers"))
@nathanmarz
nathanmarz / gist:3234645
Created August 2, 2012 07:00
One aggregator
public class One implements CombinerAggregator<Integer> {
public Integer init(TridentTuple tuple) {
return 1;
}
public Integer combine(Integer val1, Integer val2) {
return 1;
}
public Integer zero() {
@nathanmarz
nathanmarz / gist:3234647
Created August 2, 2012 07:01
Filter example
stream.each(new Fields("y"), new MyFilter())