This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class File | |
def seek_to(str) | |
until eof? | |
start = pos | |
buf = read(10000) | |
if(offset = buf.index(str)) | |
seek(start + offset + str.size) | |
return true | |
else | |
seek(start + 5000) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'stringio' | |
require 'base64' | |
def read_varint(io) | |
value = index = 0 | |
begin | |
byte = io.readchar | |
value |= (byte & 0x7f) << (7 * index) | |
index += 1 | |
end while (byte & 0x80).nonzero? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
//adapted from the LoessInterpolator in org.apache.commons.math | |
function loess_pairs(pairs, bandwidth) | |
{ | |
var xval = pairs.map(function(pair){return pair[0]}); | |
var yval = pairs.map(function(pair){return pair[1]}); | |
console.log(xval); | |
console.log(yval); | |
var res = loess(xval, yval, bandwidth); | |
console.log(res); | |
return xval.map(function(x,i){return [x, res[i]]}); |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'date' | |
DUE_DATE = "2013-05-19" | |
#data taken from http://spacefem.com/pregnant/charts/duedate2.php | |
#starts at day 222 | |
DATA = [ | |
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, | |
1, 2, 1, 4, 2, 2, 1, 4, 6, 7, 5, 1, 5, 8, 7, 9, 10, 11, 13, 13, 18, | |
14, 13, 9, 27, 29, 27, 31, 27, 26, 36, 43, 43, 51, 67, 74, 60, 47, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
class WordCount(args : Args) extends Job(args) { | |
Tsv(args("input"), ('doc_id, 'text)) | |
.flatMapTo('text -> 'token){line : String => line.split("[ \\[\\]\\(\\),.]")} | |
.map('token -> 'token){token : String => token.trim.toLowerCase} | |
.filter('token){token : String => token.length > 0} | |
.groupBy('token){g => g.size} | |
.write(Tsv(args("output"))) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The Regex Game | |
Avi Bryant | |
This is a game for two programmers, although it's easy to imagine variations for more. | |
It can be played over email, twitter, or IM, but it's easy to imagine a custom web app for it, and I encourage someone to build one. | |
Each player starts by thinking of a regular expression. The players should decide beforehand on dialect and length restrictions (eg, has to be JavaScript-compatible and under 20 characters). | |
They don't reveal the Regex, but if playing over email etc, should send each other a difficult to brute force hash (eg bcrypt) of the Regex for later verification. | |
They do reveal two strings: one which the Regex will match, and one which it will not. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# visualize the output with gg.js | |
# gg({layers: [{ geometry: 'line', mapping: { x: 'minutes', y: 'task', group: 'stage', color: 'type'}}]}); | |
def parse(line) | |
output = {} | |
parts = line.split(/[ "]/) | |
output["TYPE"] = parts.shift | |
while(parts.size > 0) | |
next_part = parts.shift | |
if next_part =~ /^(\w+)=$/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def likelihoodRatio(k1 : Int, n1 : Int, k2 : Int, n2 : Int) = { | |
def kLogP(k : Int, p : Double) = if(k == 0) 0 else k * math.log(p) | |
def logL(p : Double, k : Int, n : Int) = kLogP(k, p) + kLogP(n - k, 1 - p) | |
val p1 = k1.toDouble / n1.toDouble | |
val p2 = k2.toDouble / n2.toDouble | |
val p = (k1 + k2).toDouble / (n1 + n2).toDouble | |
logL(p1, k1, n1) + logL(p2, k2, n2) - logL(p, k1, n1) - logL(p, k2, n2) | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin | |
Adapatation of streaming DISCO algorithm to parallel streams/monoids. | |
Each instance of Disco maintains @counts which is just #(x) | |
as well as @h which stores (q,n) for each (x,y) pair. | |
n corresponds conceptually to the number of emitted values for that pair, q corresponds (as in the paper) | |
to the probability with which they were emitted. | |
A separate instance is created for each dimension, these are then merged in any order. | |
Initialize with a single dimension by setting counts(w) to 1 for each w | |
in the dimension, and h(q,n) to (1,1) for each (w1,w2). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package com.twitter.algebird | |
case class DyadicRange(maxValue : Long = Long.MaxValue) { | |
val levels = math.ceil(math.log(maxValue) / math.log(2)).toInt | |
def indicesForPoint(v : Long) = (1 to levels).map{level => (level, indexForPoint(v, level))} | |
def indicesForRange(start : Long, end : Long) : List[(Int,Long)] = indicesForRange(start, end, levels) | |
def indexForPoint(v : Long, level : Int) = v >> (level - 1) | |
def rangeForIndex(i : Long, level : Int) = (i << (level-1), ((i+1) << (level-1)) - 1) |
OlderNewer