Skip to content

Instantly share code, notes, and snippets.

View damionjunk's full-sized avatar

Damion Junk damionjunk

View GitHub Profile
@damionjunk
damionjunk / gist:3805350
Created September 29, 2012 22:40
Witten Bell Smoothing for Unigrams
(ns witten-bell.core
(:import [java.io SequenceInputStream]
[java.util Enumeration Collections])
(:require [clojure.java.io :as io]
[clojure.string :as s]
[witten-bell.io :as wio]))
(defn witten-bell
"Given a seq of lines of text, a witten-bell probability map for seen and
unseen events is returned. The map is in the form:
@damionjunk
damionjunk / gist:3920399
Created October 19, 2012 20:11
ZMQ Twitter ANEW Scoring
(ns zmq-anew3l.core
(:require [zmq-anew3l.zmqex.zhelper :as mq]
[cheshire.core :as json]
[clojure.tools.cli :as cli]
[clojure.tools.logging :as log]
[sentimental.anew :as anew]))
(defn queue-address [host port] (str "tcp://" host ":" port))
(defn gen-all
@damionjunk
damionjunk / gist:3920433
Created October 19, 2012 20:15
ZMQ ANEW output sample, fn set to all
/*
λ zmq-anew3l → lein zmqanew -lexicon /Users/djunk/projects/iarpa/twarse/data/original-anew-la.csv -fn all
/Users/djunk/code/leiningen
Compiling zmq-anew3l.core
*/
{:anew-en {:words [hay hay], :score {:v 5.239999771118164, :d 5.369999885559082, :a 3.950000047683716}}}
{:anew-pt {:words [casual], :score {:v 4.019999980926514, :d 4.289999961853027, :a 4.070000171661377}}}
{:anew-es {:words [cara], :score {:v 6.389999866485596, :d 5.670000076293945, :a 5.039999961853027}}}
{:anew-es {:words [calor], :score {:v 7.409999847412109, :d 5.610000133514404, :a 3.7300000190734863}}, :anew-pt {:words [calor], :score {:v 7.409999847412109, :d 5.610000133514404, :a 3.7300000190734863}}}
{:anew-es {:words [feliz], :score {:v 8.210000038146973, :d 6.630000114440918, :a 6.489999771118164}}, :anew-pt {:words [feliz], :score {:v 8.210000038146973, :d 6.630000114440918, :a 6.489999771118164}}}
@damionjunk
damionjunk / gist:3920444
Created October 19, 2012 20:16
ZMQ ANEW output sample, fn set to bo3 "best of 3"
/*
λ zmq-anew3l → lein zmqanew -lexicon /Users/djunk/projects/iarpa/twarse/data/original-anew-la.csv -fn bo3
/Users/djunk/code/leiningen
*/
{:v 6.880000114440917, :d 5.179999828338623, :a 5.289999961853027, :anew-en {:words [chocolate], :score {:v 6.880000114440918, :d 5.179999828338623, :a 5.289999961853027}}, :anew-es {:words [chocolate], :score {:v 6.880000114440918, :d 5.179999828338623, :a 5.289999961853027}}, :anew-pt {:words [chocolate], :score {:v 6.880000114440918, :d 5.179999828338623, :a 5.289999961853027}}}
{:v 7.389999866485596, :d 5.489999771118164, :a 6.340000152587891, :anew-es {:words [corazón], :score {:v 7.389999866485596, :d 5.489999771118164, :a 6.340000152587891}}}
{:v 7.739999771118164, :d 6.739999771118164, :a 5.739999771118164, :anew-es {:words [amigo], :score {:v 7.739999771118164, :d 6.739999771118164, :a 5.739999771118164}}, :anew-pt {:words [amigo], :score {:v 7.739999771118164, :d 6.739999771118164, :a 5.739999771118164}}}
{:v 6.02666680018107, :d 5.820000012715657, :a 4.86666663
@damionjunk
damionjunk / gist:4026818
Created November 6, 2012 19:13
Generate a gold standard file and text extraction for evalb / Assignment 8
(ns ptreader
(:require [clojure.java.io :as io]
[clojure.string :as s])
(:import [edu.stanford.nlp.ling Word WordTag]
[edu.stanford.nlp.trees PennTreeReader]))
(defn get-sentence
[words]
(s/join " "(map (fn [w] (.word w)) words)))
@damionjunk
damionjunk / gist:4044936
Created November 9, 2012 09:56
extract semcor elements matching some criteria set in an NLP assignment.
(ns wsd.semfind
(:require [clojure.java.io :as io]
[clojure.string :as s])
(:import [java.net URL]
[edu.mit.jsemcor.main IConcordance IConcordanceSet Semcor]
[edu.mit.jsemcor.element IWordform ISentence]))
(defn noun?
"Checks to see if the Wordform is a NN or NNS, and additionally has
@damionjunk
damionjunk / gist:4273277
Created December 13, 2012 01:30
A quick example of Clojure / StanfordNLP CoNLL parsing.
(ns wujuko.nlp.fmt-read
(:require [clojure.java.io :as io])
(:import [java.util Properties]
[edu.stanford.nlp.sequences
SeqClassifierFlags
CoNLLDocumentReaderAndWriter
ColumnDocumentReaderAndWriter]
[edu.stanford.nlp.ling CoreLabel]))
(defn corelabel->map
(let [parts 11 ; Pre-computed, based on target ABV of 43%.
ml-target 30
;; Part/Ratios
tomatin (/ 7.5 parts)
cs (/ 2.5 parts)
dw (/ 1.0 parts)
;; Prices
t-ml-p (/ 19.99 750)
cs-ml-p (/ 79.99 750)
m12-ml-p (/ 53.99 750)
({:country "Argentina",
:embersId "9008611e531ff5e88d6d0b4ca4fc4fe680fb4f9d",
:date "2012-01-01",
:holidayName "New Year's Day",
:stockIndex "MERVAL"}
{:country "Argentina",
:embersId "e150af6d32ac8a222d8d83438e0329c9fab99ca6",
:date "2012-02-20",
:holidayName "Carnival",
<broadcast-groups>
<broadcast-group name="bg-group2">
<jgroups-stack>${jgroups.stack:tcp}</jgroups-stack>
<jgroups-channel>${jgroups.channel:hq-cluster}</jgroups-channel>
<broadcast-period>2000</broadcast-period>
<connector-ref>netty</connector-ref>
</broadcast-group>
</broadcast-groups>
<discovery-groups>