Skip to content

Instantly share code, notes, and snippets.

View brendano's full-sized avatar

Brendan O'Connor brendano

View GitHub Profile
all D 0.474
out P 0.698
of P 0.970
wood ^ 0.633
facts N 0.906
u O 0.997
can V 1.000
build V 1.000
a D 0.998
all D 0.323
out P 0.578
of P 0.955
wood ^ 0.584
facts N 0.912
u O 0.998
can V 0.999
build V 1.000
a D 0.998
http://opinionator.blogs.nytimes.com/2012/08/08/hear-all-ye-people-hearken-o-earth/
http://news.ycombinator.com/item?id=4362277
http://news.ycombinator.com/item?id=4365086
2009-01 19
2009-02 20
2009-03 48
2009-04 100
2009-05 275
2009-06 292
2009-07 494
2009-08 259
2009-09 207
2009-10 65
{
"contributors": null,
"coordinates": null,
"created_at": "Tue Jul 03 00:35:49 +0000 2012",
"entities": {
"hashtags": [],
"urls": [],
"user_mentions": []
},
"favorited": false,
2009-06-01 4507 1 [470275]
2009-06-08 4507 1 [422879]
2009-06-22 4507 1 [257976]
2009-07-06 4507 1 [69444]
2009-07-13 4507 4 [16042,237457,457813,61273]
2009-07-20 4507 6 [422879,358078,82438,34891,47316,97749]
2009-07-27 4507 2 [423154,477645]
2009-08-10 4507 6 [356596,316917,99418,247707,230452,3538]
2009-08-17 4507 7 [82438,263332,23135,35494,94656,122471,272590]
2009-08-24 4507 7 [157430,338463,426119,157430,405565,309448,338463]
# handle the wikipedia dump format
module WikiDump
def self.yield_page_strings(stream)
buf = ""
stream.each do |line|
if line =~ /^\s* <page> \s*$/x
buf = ""
#!/usr/bin/env ruby
#
# Data structures and proccessing of documents to be indexed, e.g. wikipedia
# pages. ok, everything is wikipedia-specific. :-)
#
# This file can be executed for various sorts of testing (see bottom)
require File.dirname(__FILE__)+'/common'
#!/usr/bin/env python
"""
Convert STDIN to UTF-8
based on character encoding detection
"""
import sys, json, itertools
from chardet.universaldetector import UniversalDetector
detector = UniversalDetector()
@brendano
brendano / NOTES.md
Created June 12, 2012 20:03
Patches to compile ocropus on Mac OSX 10.6 -- see explanation at NOTES.md at bottom https://gist.github.com/2919800#file_notes.md

by Brendan O'Connor (http://brenocon.com)

I got all of ocropus to compile on Mac OSX 10.6, though I haven't tested it much yet. This is the current version inside the ocropus hg repository, so approximately version 0.5, with iulib perhaps 0.4ish.

See ocroinst.osx -- the first file in "everything_besides_iulib.diff" -- for line-by-line instructions; the script may even just run. We're assuming Homebrew and pip (see the comments).