Skip to content

Instantly share code, notes, and snippets.

@apg
Created September 2, 2010 14:48
Show Gist options
  • Save apg/562386 to your computer and use it in GitHub Desktop.
Save apg/562386 to your computer and use it in GitHub Desktop.
;; Lingua::EN::Syllable::syllable() estimates the number of syllables in
;; the word passed to it.
;; Note that it isn't entirely accurate... it fails (by one syllable) for
;; about 10-15% of my /usr/dict/words. The only way to get a 100% accurate
;; count is to do a dictionary lookup, so this is a small and fast alternative
;; where more-or-less accurate results will suffice, such as estimating the
;; reading level of a document.
;; I welcome pointers to more accurate algorithms, since this one is
;; pretty quick-and-dirty. This was designed for English (well, American
;; at least) words, but sometimes guesses well for other languages.
;; KNOWN LIMITATIONS
;; Accuracy for words with non-alpha characters is somewhat undefined.
;; In general, punctuation characters, et al, should be trimmed off before
;; handing the word to syllable(), and hyphenated compounds should be broken
;; into their separate parts.
;; Syllables for all-digit words (eg, "1998"; some call them "numbers") are
;; often counted as the number of digits. A cooler solution would be converting
;; "1998" to "nineteen eighty eight" (or "one thousand nine hundred eighty
;; eight", or...), but that is left as an exercise for the reader.
;; Contractions are not well supported.
;; Compound words (like "lifeboat"), where the first word ends in a silent 'e'
;; are counted with an extra syllable.
;; COPYRIGHT
;; Distributed under the same terms as Perl.
;; Contact the author with any questions.
;; AUTHORs
;; Greg Fast
;; Dispenser (python port)
;; Andrew Gwozdziewycz (mods to python port http://gist.github.com/562255,
;; elisp port) and haiku functionality
(require 'cl)
(defconst count-syllables-negative
'("cial" "tia" "cius" "cious" "giu" "ion" "iou" "sia$" ".ely$"))
(defconst count-syllables-positive
'("ia" "riet" "dien" "iu" "io" "li"
"[aeiouym]bl$"
"[aeiou]\\{3\\}"
"^mc"
"ism$" ; isms
"\\([^aeiouy]\\)\\1l$" ; middle twiddle battle bottle, etc
"[^l]lien" ; alien, salient, but not lien, or ebbuillient
"^coa[dglx]" ; exception for words coadjutor coagulable coagulate
; coalesce coalescent coalition coaxial
"[^gq]ua[^aeiou]"
"dnt$")) ; couldn't
(defun count-syllables (word)
(let ((word (downcase word))
(word (replace-regexp-in-string "'" "" word))
(word (replace-regexp-in-string "e$" "" word))
(vowgrouplen (length (remove-if-not '(lambda (x) (> (length x) 0))
(split-string word "[^aeiouy]+")))))
(if (= (length word) 1)
1
(progn
(let ((pluses (reduce '(lambda (count thing)
(if (string-match-p thing word)
(1+ count)
count))
count-syllables-positive :initial-value 0))
(minuses (reduce '(lambda (count thing)
(if (string-match-p thing word)
(1- count)
count))
count-syllables-negative :initial-value 0)))
(or (+ pluses minuses vowgrouplen) 1))))))
(defun haiku-p (s &optional seperator)
"Tests if s is a haiku"
(interactive)
(let ((seperator (or seperator "\n"))
(sentences (split-string s seperator)))
(equalp (mapcar '(lambda (sent)
(reduce '(lambda (count word)
(+ count (count-syllables word)))
(remove-if-not '(lambda (x) (> (length x) 0))
(split-string sent "[ ,.;:?#-]+"))
:initial-value 0))
sentences)
'(5 7 5))))
;; (haiku-p "yay! new google chrome / give away your privacy / but it sure beats lynx" " / ")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment