Skip to content

Instantly share code, notes, and snippets.

@amundo
Created November 6, 2010 02:08
Show Gist options
  • Save amundo/665129 to your computer and use it in GitHub Desktop.
Save amundo/665129 to your computer and use it in GitHub Desktop.
an incomplete program for looking for cross-linguistic "minimal pairs" involving English flapped coronals and Spanish [r] between English and Spanish
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
"""
American English flapped /t/ or /d/ (both realized as [ɾ]) might
sound to a Spanish speaker as being closer to a Spanish /r/ than
either Spanish /t/ or /d/ (especially since Spanish /d/ is often
realized as [ð]). Wanted: real English words that would "be" real
Spanish words if you interpreted the English [ɾ] as a Spanish /r/.
If we symbolize English /t/ and /d/ as T, we can look for words
that have intervocalic T in a phonetically transcribed database.
We'll use the readily available CMU pronouncing dictionary:
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
Our program will look for the pattern VTV in this dictionary,
then replace the T with r, and search for words that match the
resulting string in a Spanish wordlist.
"""
SPANISH_WORDS = 'spanish-words.txt'
spanish_words = [w.strip() for w in open(SPANISH_WORDS,'U').decode('utf-8').splitlines()]
spanish_words = set(spanish_words)
cmu_dict = process_cmu_dict(CMUDICT)
spanish_words = set(spanish_words)
def process_cmu_dict(cmudict_file):
# create a phonetic wordlist
pass
flappablePATTERN = '' # something that will find intervocalic t and d
def rhoticize(word):
return re.sub(flappablePATTERN, 'r', word)
for word in cmu_dict:
if rhoticize(word) in spanish_words:
print word
"""
What's needed to make this happen:
1) figure out how to write process_cmu_dict()
2) find a suitable spanish-words.txt, the bigger the better
3) figure out how to define flappablePATTERN
4) run it.
Extending this to look for English/Korean (or any other language)
would involve a similar process, but would involve a transliteration
stage for many languages.
"""
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment