Skip to content

Instantly share code, notes, and snippets.

;; here's a simpler query w/the same result - using a single file downloaded
;; from the s3 source path below.
;; preliminary investigation suggests a dependency issue - the newer
;; version of Kryo required by an updated dependency maybe can't read
;; data created with an older version. The input files were created a few months ago
;; and we've since bumped version numbers with a few dependencies
(?- (hfs-textline "/tmp/test" :sinkmode :replace) (hfs-seqfile "/home/hadoop/part-00000"))
(use 'gulo.views-test)
(in-ns 'gulo.views-test)
;; from the views-test ns, make some test data
;; to-pail is used in mk-test-data, and calls .absorb on the temp
;; pail that is created, moving it into PAIL-PATH
(mk-test-data)
;; consolidate the pail - default of .consolidate is 128mb max file size
lein do sub install, deps, compile
Reading project from cascalog-core
Compiling 76 source files to /mnt/hgfs/Dropbox/code/github/nathanmarz/cascalog/cascalog-core/target/classes
warning: [options] bootstrap class path not set in conjunction with -source 1.6
/mnt/hgfs/Dropbox/code/github/nathanmarz/cascalog/cascalog-core/src/java/cascalog/ClojureCombinerBase.java:32: error: package org.apache.hadoop.mapred does not exist
import org.apache.hadoop.mapred.JobConf;
^
/mnt/hgfs/Dropbox/code/github/nathanmarz/cascalog/cascalog-core/src/java/cascalog/MultiGroupBy.java:33: error: package org.apache.hadoop.io.compress does not exist
import org.apache.hadoop.io.compress.CompressionCodec;
^
Caused by: java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
at org.slf4j.LoggerFactory.<clinit>(LoggerFactory.java:60)
at cascading.util.Util.<clinit>(Util.java:64)
at cascading.tuple.coerce.Coercions.<clinit>(Coercions.java:105)
at cascading.tuple.TupleEntry.<clinit>(TupleEntry.java:56)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:186)
at cascalog.workflow$loading__4784__auto__.invoke(workflow.clj:1)
at clojure.lang.AFn.applyToHelper(AFn.java:159)
at clojure.lang.AFn.applyTo(AFn.java:151)
@robinkraft
robinkraft / gist:4465108
Created January 6, 2013 03:56
simple ordering implementation in Python for foursquare hackathon dish ranking app - messy messy messy!
import json
TESTSTRING = '[{ "_id" : "HZXXY3", "venues" : { "A" : [ "B" ], "B" : [ "C" ], "C" : [ ] } }, { "_id" : "HZXXY4", "venues" : { "A" : [ "B" ], "B" : [ "C" ], "C" : [ ] } }, { "_id" : "HZXXY5", "venues" : { "A" : [ "B", "C"], "B" : [ "A", "C" ], "C" : ["A"] } }]'
# initialize ballot dict
# loop through records
# loop through venues
# add or increment "for" ballot for venue
# loop through beaten venues
@robinkraft
robinkraft / gist:4465104
Created January 6, 2013 03:55
simple ordering implementation for foursquare hackathon dish ranking app - messy messy messy!
var B = {};
var REC = { "_id" : "HZXXY3", "venues" : { "A" : [ "B" ], "B" : [ "C" ], "C" : [ ] } };
var RECS = [rec];
TESTJSON = [{ "_id" : "HZXXY3", "venues" : { "A" : [ "B" ], "B" : [ "C" ], "C" : [ ] } }, { "_id" : "HZXXY4", "venues" : { "A" : [ "B" ], "B" : [ "C" ], "C" : [ ] } }, { "_id" : "HZXXY5", "venues" : { "A" : [ "B", "C"], "B" : [ "A", "C" ], "C" : ["A"] } }];
function increment_venue_ballot(v, k, ballot) {
if (typeof(ballot[v]) == "undefined") {
ballot[v] = {"for": 0, "against": 0}; // initialize ballot with 0s
}
++ballot[v][k]; // increment for or against, depending on value of k
@robinkraft
robinkraft / gist:4448738
Last active December 10, 2015 14:38
FORMA costs, by workflow step
# Dataframe:
structure(list(step = structure(1:10, .Label = c("adjusted",
"estimated", "fires", "forma-tap", "ndvi-filtered", "ndvi-series",
"neighbors", "rain", "rain-series", "trends"), class = "factor"),
storage = c(147, 4.8, 4.9, 15.4, 93, 715.1, 25.7, 0.22, 407.8,
21.39), time = c(100L, 22L, 33L, 16L, 100L, 190L, 53L, 6L,
26L, 116L), storecost = c(6.51913043478261, 0.212869565217391,
0.217304347826087, 0.68295652173913, 4.12434782608696, 31.7131304347826,
1.13973913043478, 0.00975652173913044, 18.0850434782609,
@robinkraft
robinkraft / gist:4341518
Created December 19, 2012 23:16
Simple example of using a Clojure macro to dispatch to functions by passing in a string of the function name.
(defn plus [& n] (apply + n))
(defn minus [& n] (apply - n))
(defn mult [& n] (apply * n))
(defmacro do-math
[math & nums]
`(~(read-string math) ~@nums))
(do-math "plus" 1 2 3 4)
;=> 10
@robinkraft
robinkraft / gist:4332954
Created December 18, 2012 23:07
Copy MODIS files for selected tile through selected date from one S3 bucket to another.
import boto
TILE = "h28v09"
MAX_DATE = "2006-02-18"
def main():
print "Connecting to S3"
# assuming AWS credentials are available as environment variables or in .boto file
@robinkraft
robinkraft / gist:4155956
Created November 27, 2012 18:09
generic defbufferop proof of concept for fossa partitioning
(use 'cascalog.api)
(def a (vec (map vector (repeat 20 1) (vec (range 20 0 -1)) (vec (range 50 30 -1)))))
;; [[1 20 50] [1 19 49] [1 18 48] [1 17 47] [1 16 46] [1 15 45] [1 14 44] [1 13 43] [1 12 42] [1 11 41] [1 10 40] [1 9 39] [1 8 38] [1 7 37] [1 6 36] [1 5 35] [1 4 34] [1 3 33] [1 2 32] [1 1 31]]
(defn tester-func
[v]
(apply str (map (partial apply str) v)))
(defbufferop tester