Skip to content

Instantly share code, notes, and snippets.

@MHenderson
Created July 22, 2013 10:39
Show Gist options
  • Save MHenderson/6052936 to your computer and use it in GitHub Desktop.
Save MHenderson/6052936 to your computer and use it in GitHub Desktop.
Speech annotation in Python using regular expressions. http://nbviewer.ipython.org/6052936
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "speech-annotation"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Speech annotation in Python using regular expressions"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import codecs, nltk\n",
"little_dorrit_path = \"/home/matthew/workspace/resources/C/Corpus Stylistics/Dickens, Charles/pg963.txt\"\n",
"f = codecs.open(little_dorrit_path, encoding = 'utf-8')\n",
"little_dorrit_file = codecs.open(little_dorrit_path, encoding = 'utf-8')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"little_dorrit_raw = little_dorrit_file.read()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"len(little_dorrit_raw)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 3,
"text": [
"1936177"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"begin_phrase = u'At the close of this recital'\n",
"begin = little_dorrit_raw.find(begin_phrase)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"end_phrase = 'producing the money.'\n",
"end = little_dorrit_raw.find(end_phrase)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"task_string = little_dorrit_raw[begin:end + len(end_phrase)]\n",
"print task_string"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"At the close of this recital, Arthur turned his eyes upon the impudent\r\n",
"and wicked face. As it met his, the nose came down over the moustache\r\n",
"and the moustache went up under the nose. When nose and moustache had\r\n",
"settled into their places again, Monsieur Rigaud loudly snapped his\r\n",
"fingers half-a-dozen times; bending forward to jerk the snaps at Arthur,\r\n",
"as if they were palpable missiles which he jerked into his face.\r\n",
"\r\n",
"'Now, Philosopher!' said Rigaud.'What do you want with me?'\r\n",
"\r\n",
"'I want to know,' returned Arthur, without disguising his abhorrence,\r\n",
"'how you dare direct a suspicion of murder against my mother's house?'\r\n",
"\r\n",
"'Dare!' cried Rigaud. 'Ho, ho! Hear him! Dare? Is it dare? By Heaven, my\r\n",
"small boy, but you are a little imprudent!'\r\n",
"\r\n",
"'I want that suspicion to be cleared away,' said Arthur. 'You shall\r\n",
"be taken there, and be publicly seen. I want to know, moreover,\r\n",
"what business you had there when I had a burning desire to fling you\r\n",
"down-stairs. Don't frown at me, man! I have seen enough of you to know\r\n",
"that you are a bully and coward. I need no revival of my spirits from\r\n",
"the effects of this wretched place to tell you so plain a fact, and one\r\n",
"that you know so well.'\r\n",
"\r\n",
"White to the lips, Rigaud stroked his moustache, muttering, 'By Heaven,\r\n",
"my small boy, but you are a little compromising of my lady, your\r\n",
"respectable mother'--and seemed for a minute undecided how to act.\r\n",
"His indecision was soon gone. He sat himself down with a threatening\r\n",
"swagger, and said:\r\n",
"\r\n",
"'Give me a bottle of wine. You can buy wine here. Send one of your\r\n",
"madmen to get me a bottle of wine. I won't talk to you without wine.\r\n",
"Come! Yes or no?'\r\n",
"\r\n",
"'Fetch him what he wants, Cavalletto,' said Arthur, scornfully,\r\n",
"producing the money.\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import re\n",
"\n",
"re_1 = r\"'[^']+'\"\n",
"re_2 = r\"'[^']+[\\.,!?]'\"\n",
"re_3 = r\"'[^']+(?:[-'][^']+)*'\""
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nltk.re_show(re_1, task_string)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"At the close of this recital, Arthur turned his eyes upon the impudent\r\n",
"and wicked face. As it met his, the nose came down over the moustache\r\n",
"and the moustache went up under the nose. When nose and moustache had\r\n",
"settled into their places again, Monsieur Rigaud loudly snapped his\r\n",
"fingers half-a-dozen times; bending forward to jerk the snaps at Arthur,\r\n",
"as if they were palpable missiles which he jerked into his face.\r\n",
"\r\n",
"{'Now, Philosopher!'} said Rigaud.{'What do you want with me?'}\r\n",
"\r\n",
"{'I want to know,'} returned Arthur, without disguising his abhorrence,\r\n",
"{'how you dare direct a suspicion of murder against my mother'}s house?{'\r\n",
"\r\n",
"'}Dare!{' cried Rigaud. '}Ho, ho! Hear him! Dare? Is it dare? By Heaven, my\r\n",
"small boy, but you are a little imprudent!{'\r\n",
"\r\n",
"'}I want that suspicion to be cleared away,{' said Arthur. '}You shall\r\n",
"be taken there, and be publicly seen. I want to know, moreover,\r\n",
"what business you had there when I had a burning desire to fling you\r\n",
"down-stairs. Don{'t frown at me, man! I have seen enough of you to know\r\n",
"that you are a bully and coward. I need no revival of my spirits from\r\n",
"the effects of this wretched place to tell you so plain a fact, and one\r\n",
"that you know so well.'}\r\n",
"\r\n",
"White to the lips, Rigaud stroked his moustache, muttering, {'By Heaven,\r\n",
"my small boy, but you are a little compromising of my lady, your\r\n",
"respectable mother'}--and seemed for a minute undecided how to act.\r\n",
"His indecision was soon gone. He sat himself down with a threatening\r\n",
"swagger, and said:\r\n",
"\r\n",
"{'Give me a bottle of wine. You can buy wine here. Send one of your\r\n",
"madmen to get me a bottle of wine. I won'}t talk to you without wine.\r\n",
"Come! Yes or no?{'\r\n",
"\r\n",
"'}Fetch him what he wants, Cavalletto,' said Arthur, scornfully,\r\n",
"producing the money.\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nltk.re_show(re_2, task_string)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"At the close of this recital, Arthur turned his eyes upon the impudent\r\n",
"and wicked face. As it met his, the nose came down over the moustache\r\n",
"and the moustache went up under the nose. When nose and moustache had\r\n",
"settled into their places again, Monsieur Rigaud loudly snapped his\r\n",
"fingers half-a-dozen times; bending forward to jerk the snaps at Arthur,\r\n",
"as if they were palpable missiles which he jerked into his face.\r\n",
"\r\n",
"{'Now, Philosopher!'} said Rigaud.{'What do you want with me?'}\r\n",
"\r\n",
"{'I want to know,'} returned Arthur, without disguising his abhorrence,\r\n",
"'how you dare direct a suspicion of murder against my mother{'s house?'}\r\n",
"\r\n",
"{'Dare!'} cried Rigaud. {'Ho, ho! Hear him! Dare? Is it dare? By Heaven, my\r\n",
"small boy, but you are a little imprudent!'}\r\n",
"\r\n",
"{'I want that suspicion to be cleared away,'} said Arthur. 'You shall\r\n",
"be taken there, and be publicly seen. I want to know, moreover,\r\n",
"what business you had there when I had a burning desire to fling you\r\n",
"down-stairs. Don{'t frown at me, man! I have seen enough of you to know\r\n",
"that you are a bully and coward. I need no revival of my spirits from\r\n",
"the effects of this wretched place to tell you so plain a fact, and one\r\n",
"that you know so well.'}\r\n",
"\r\n",
"White to the lips, Rigaud stroked his moustache, muttering, 'By Heaven,\r\n",
"my small boy, but you are a little compromising of my lady, your\r\n",
"respectable mother'--and seemed for a minute undecided how to act.\r\n",
"His indecision was soon gone. He sat himself down with a threatening\r\n",
"swagger, and said:\r\n",
"\r\n",
"'Give me a bottle of wine. You can buy wine here. Send one of your\r\n",
"madmen to get me a bottle of wine. I won{'t talk to you without wine.\r\n",
"Come! Yes or no?'}\r\n",
"\r\n",
"{'Fetch him what he wants, Cavalletto,'} said Arthur, scornfully,\r\n",
"producing the money.\n"
]
}
],
"prompt_number": 9
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"nltk.re_show(re_3, task_string)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"At the close of this recital, Arthur turned his eyes upon the impudent\r\n",
"and wicked face. As it met his, the nose came down over the moustache\r\n",
"and the moustache went up under the nose. When nose and moustache had\r\n",
"settled into their places again, Monsieur Rigaud loudly snapped his\r\n",
"fingers half-a-dozen times; bending forward to jerk the snaps at Arthur,\r\n",
"as if they were palpable missiles which he jerked into his face.\r\n",
"\r\n",
"{'Now, Philosopher!' said Rigaud.'What do you want with me?'\r\n",
"\r\n",
"'I want to know,' returned Arthur, without disguising his abhorrence,\r\n",
"'how you dare direct a suspicion of murder against my mother's house?'\r\n",
"\r\n",
"'Dare!' cried Rigaud. 'Ho, ho! Hear him! Dare? Is it dare? By Heaven, my\r\n",
"small boy, but you are a little imprudent!'\r\n",
"\r\n",
"'I want that suspicion to be cleared away,' said Arthur. 'You shall\r\n",
"be taken there, and be publicly seen. I want to know, moreover,\r\n",
"what business you had there when I had a burning desire to fling you\r\n",
"down-stairs. Don't frown at me, man! I have seen enough of you to know\r\n",
"that you are a bully and coward. I need no revival of my spirits from\r\n",
"the effects of this wretched place to tell you so plain a fact, and one\r\n",
"that you know so well.'\r\n",
"\r\n",
"White to the lips, Rigaud stroked his moustache, muttering, 'By Heaven,\r\n",
"my small boy, but you are a little compromising of my lady, your\r\n",
"respectable mother'--and seemed for a minute undecided how to act.\r\n",
"His indecision was soon gone. He sat himself down with a threatening\r\n",
"swagger, and said:\r\n",
"\r\n",
"'Give me a bottle of wine. You can buy wine here. Send one of your\r\n",
"madmen to get me a bottle of wine. I won't talk to you without wine.\r\n",
"Come! Yes or no?'\r\n",
"\r\n",
"'Fetch him what he wants, Cavalletto,'} said Arthur, scornfully,\r\n",
"producing the money.\n"
]
}
],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment