Skip to content

Instantly share code, notes, and snippets.

View brendano's full-sized avatar

Brendan O'Connor brendano

View GitHub Profile
13150 samples ~ 0.4% se, though more in reality
File Function Line
50.3 % treetm.jl cgsIterPath 383 qnewLL,pnewLL = proposePath!(newpath, mm, V,di,word, first ? nothing : oldpath, :simulate)
44.1 % treetm.jl cgsIterPath 384 qoldLL,poldLL = proposePath!(oldpath, mm, V,di,word, first ? nothing : oldpath, :evaluate)
4.1 % treetm.jl getindex 79 getindex(c::ClassCountTable, k) = c.counts[k]
2.1 % treetm.jl incrementFullpath! 122 x = b ? x.right : x.left
22.0 % treetm.jl proposePath! 337 w0 = (n0.counts[wordID] + betaHere/V - on0) / (n0.counts.total + betaHere - int(on_cur))
18.9 % treetm.jl proposePath! 338 w1 = (n1.counts[wordID] + betaHere/V - on1) / (n1.counts.total + betaHere - int(on_cur))
5.2 % treetm.jl proposePath! 339 p0 = (cur_docnode.left.count + mm.gammaConc/2 - on0)
6.5 % treetm.jl proposePath! 340 p1 = (cur_do
File Function Line
48.0 % /Users/brendano/Desktop/hier_lda/code/treetm.jl cgsIterPath 389 qnewLL,pnewLL = proposePath!(newpath, mm, V,di,word, first ? nothing : oldpath, :simulate)
41.6 % /Users/brendano/Desktop/hier_lda/code/treetm.jl cgsIterPath 390 qoldLL,poldLL = proposePath!(oldpath, mm, V,di,word, first ? nothing : oldpath, :evaluate)
1.1 % /Users/brendano/Desktop/hier_lda/code/treetm.jl cgsIterPath 391 logA = pnewLL-poldLL + qoldLL-qnewLL # (pnew-qnew) - (pold-qnew)
3.0 % /Users/brendano/Desktop/hier_lda/code/treetm.jl cgsIterPath 402 incrementFullpath!(mm.cTopicWord, newpath, word, +1)
3.1 % /Users/brendano/Desktop/hier_lda/code/treetm.jl cgsIterPath 408 incrementFullpath!(mm.cTopicWord, oldpath, word, -1)
2.7 % /Users/brendano/Desktop/hier_lda/code/treetm.jl getindex 82 getindex(c::ClassCountTable, k) = c.c
julinclude("gotree.jl")
Array{CountTrie,1}
accept rate = 850654/850654 = 1.000
elapsed time: 33.986743839 seconds (5921922372 bytes allocated)
.ITER 1
accept rate = 809118/850654 = 0.951
elapsed time: 36.449559574 seconds (6175698392 bytes allocated)
.ITER 2
accept rate = 796254/850654 = 0.936
elapsed time: 30.21721326 seconds (6166426280 bytes allocated)
# http://mikelove.wordpress.com/2013/11/07/empirical-bayes/
# Stein's estimation rule and its competitors - an empirical Bayes approach
# B Efron, C Morris, Journal of the American Statistical, 1973
n <- 1000
sigma.means <- 5
means <- rnorm(n, 0, sigma.means)
# sigma.y <- 5
library(manipulate)
manipulate({
package nlp;
import java.io.IOException;
import java.io.StringReader;
import edu.stanford.nlp.io.IOUtils;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.ling.CoreLabel;
import edu.stanford.nlp.trees.LabeledScoredTreeFactory;
import edu.stanford.nlp.trees.PennTreeReader;

In Thomas Bass's The Predictors,

In one scene they are talking to a potential investor who kept wanting to talk about their earlier complexity theory work,

Marrin wanted chaos and fractals, and we were offering engineering and statistics.

I remember reading that and thinking, wait, but isn't that why I'm reading this book, and why the book is supposed to be interesting? I stopped reading the book at some point after that.

This is a edu.stanford.nlp.trees.Tree
tree.setSpans(); // these are 0-indexed, inclusive-inclusive
tree.indexSpans(); // yup, this saves stuff to a different place. apparently 0-indexed inclusive-exclusive
tree.indexLeaves(); // these are 1-indexed (!!) stanfordnlp coref code heavily uses them
===
[Update July 25... and after https://gist.github.com/leondz/6082658 ]
OK never mind the questions about cross-validation versus a smaller eval split and all that.
We evaluated our tagger (current release, version 0.3.2),
trained and evaluated on the same splits as the GATE tagger
(from http://gate.ac.uk/wiki/twitter-postagger.html and specifically twitie-tagger.zip)
and it gets 90.4% accuracy (significantly different than the GATE results).
@brendano
brendano / morpha.py
Last active April 16, 2021 19:18
Python wrapper for morpha (English lemmatizer)
"""
Wrapper around morpha from
http://www.informatics.sussex.ac.uk/research/groups/nlp/carroll/morph.html
Vaguely follows edu.stanford.nlp.Morphology except we implement with a pipe.
hacky. Would be nice to use cython/swig/ctypes to directly embed morpha.yy.c
as a python extension.
TODO compare linguistic quality to lemmatizer in python's "pattern" package
mcmc convergence diagnostics
https://github.com/brendano/conplot
~/myutil % grep totalLL log|awk '{print $2}' | conplot
-2.87e+06 o
oooo
o oooooooooooooooooooooooooooooooooooooo
oooo
-2.93e+06 ooooo