Skip to content

Instantly share code, notes, and snippets.

@aria42
Created September 19, 2010 19:21
Show Gist options
  • Save aria42/587038 to your computer and use it in GitHub Desktop.
Save aria42/587038 to your computer and use it in GitHub Desktop.
; Word Information
; word: string of word
; count: # of usages
; feats: map of feature-type to feature-value
; contexts: counter of [before-word after-word] usages (for HMM)
(defrecord WordInfo [word count feats contexts])
(defn get-feats
"Features on a word type"
[w]
{:hasInitCap (boolean (re-matches #"[A-Z].*" w))
:hasPunc (boolean (re-matches #".*\W.*" w))
:suffix (let [suffix-length (min 3 (.length w))]
(.substring #^String w (- (.length w) suffix-length)))})
(defn new-word-info [word]
(WordInfo. word 0 (get-feats word) (Counter. {} 0)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment