Last active
December 18, 2015 08:09
-
-
Save sirovenmitts/5751827 to your computer and use it in GitHub Desktop.
Guess the number of syllables in a word. Not as good as looking up stuff from the CMU pronunciation dictionary...
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"This is a naive implementation of syllable guessing. This is for Amber smalltalk. It probably works elsewhere with minor changes." | |
| word | | |
word := 'tumult' asLowercase. | |
word := word replaceRegexp: '(?:[^laeiouy]es|ed|[^laeiouy]e)$' with: ''. | |
word := word replaceRegexp: '^y' with: ''. | |
console log: ( word matchesOf: ( RegularExpression fromString: '[aeiouy]{1,2}' flag: 'g' ) ) size. | |
"Use the CMU Pronunciation Dictionary to determine the number and stress of syllables in a word. Obviously this only works with words that are in the dictionary." | |
( 'S IY1 . EH1 M . Y UW1 . D IH1 K SH AH0 N EH2 R IY0' replaceRegexp: ( RegularExpression fromString: '[^\d]' flag: 'g' ) with: '' ) tokenize: ''. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I hate using
#tokenize: ''
to convert the String to an Array; I should hide that behind some message like#asArray
.