Created
November 6, 2010 02:08
-
-
Save amundo/665129 to your computer and use it in GitHub Desktop.
an incomplete program for looking for cross-linguistic "minimal pairs" involving English flapped coronals and Spanish [r] between English and Spanish
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
import re | |
""" | |
American English flapped /t/ or /d/ (both realized as [ɾ]) might | |
sound to a Spanish speaker as being closer to a Spanish /r/ than | |
either Spanish /t/ or /d/ (especially since Spanish /d/ is often | |
realized as [ð]). Wanted: real English words that would "be" real | |
Spanish words if you interpreted the English [ɾ] as a Spanish /r/. | |
If we symbolize English /t/ and /d/ as T, we can look for words | |
that have intervocalic T in a phonetically transcribed database. | |
We'll use the readily available CMU pronouncing dictionary: | |
http://www.speech.cs.cmu.edu/cgi-bin/cmudict | |
Our program will look for the pattern VTV in this dictionary, | |
then replace the T with r, and search for words that match the | |
resulting string in a Spanish wordlist. | |
""" | |
SPANISH_WORDS = 'spanish-words.txt' | |
spanish_words = [w.strip() for w in open(SPANISH_WORDS,'U').decode('utf-8').splitlines()] | |
spanish_words = set(spanish_words) | |
cmu_dict = process_cmu_dict(CMUDICT) | |
spanish_words = set(spanish_words) | |
def process_cmu_dict(cmudict_file): | |
# create a phonetic wordlist | |
pass | |
flappablePATTERN = '' # something that will find intervocalic t and d | |
def rhoticize(word): | |
return re.sub(flappablePATTERN, 'r', word) | |
for word in cmu_dict: | |
if rhoticize(word) in spanish_words: | |
print word | |
""" | |
What's needed to make this happen: | |
1) figure out how to write process_cmu_dict() | |
2) find a suitable spanish-words.txt, the bigger the better | |
3) figure out how to define flappablePATTERN | |
4) run it. | |
Extending this to look for English/Korean (or any other language) | |
would involve a similar process, but would involve a transliteration | |
stage for many languages. | |
""" | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment