Created
April 5, 2012 01:12
-
-
Save caiobegotti/2307114 to your computer and use it in GitHub Desktop.
shell2python
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Shell: | |
cat *.xml | sed 's/\([A-Z][[:alpha:]]\{0,\}\. [A-Z]\)/| \1/g' | tr '|' '\n' | sed 's/^\( .\{25\}\).*$/\1/g' | |
Cn. Octauii praecidi capu | |
P. Crassi | |
Sp. Albinus, homines cons | |
M. Antonii, omnium eloque | |
C. Caesaris, in quo mihi | |
C. Marius tum, cum Cimbri | |
Tib. Graccho legum auctor | |
C. Drusi domum compleri a | |
Cn. Aufidius praetorius e | |
M. Crassus, sed aliud mol | |
Python: | |
>>> regex = re.compile("[A-Z]'?\w{0,4}\. [A-Z]{0,}\w{0,}"); | |
>>> regex.findall(text) | |
['Cn. Octauii', 'P. Crassi', 'Sp. Albinus', 'M. Antonii', 'C. Caesaris', 'C. Marius', 'Tib. Graccho', 'C. Drusi', 'Cn. Aufidius', 'M. Crassus'] | |
PythonRegex.Com: | |
>>> regex = re.compile("([A-Z]'?\w{0,4}\. \b[A-Z]{0,}\b\w{0,}(\. )?(\b[A-Z]{0,}\b\w{0,})?)",re.UNICODE) | |
>>> r = regex.search(string) | |
>>> r | |
<_sre.SRE_Match object at 0xf5896972b7d36f08> | |
>>> regex.match(string) | |
None | |
# List the groups found | |
>>> r.groups() | |
(u'M. Tullius', None, u'') | |
# List the named dictionary objects found | |
>>> r.groupdict() | |
{} | |
# Run findall | |
>>> regex.findall(string) | |
[(u'M. Tullius', u'', u''), (u'Mer. Caio', u'', u''), (u'F. P. totalia', u'. ', u'totalia'), (u'Ga. Cesar', u'', u''), (u"M'. C. Memento", u'. ', u'Memento'), (u'M. Metello', u'', u''), (u'Q. Verrem', u'', u''), (u'M. Metellum', u'', u''), (u'M. Metellum', u'', u''), (u'Q. Metellum', u'', u''), (u'L. Metellus', u'', u''), (u"M'. Glabrionem", u'', u''), (u'M. Caesonius', u'', u''), (u'Q. Manlium', u'', u''), (u'Q. Cornificium', u'', u''), (u'P. Sulpicius', u'', u''), (u'M. Crepereius', u'', u''), (u'L. Cassius', u'', u''), (u'Cn. Tremellius', u'', u''), (u'M. Metelli', u'', u''), (u'Cn. Pompeius', u'', u''), (u'M. Metellum', u'', u'')] | |
# Run timeit test | |
>>> setup = ur"import re; regex =re.compile("([A-Z]'?\w{0,4}\. \b[A-Z]{0,}\b\w{0,}(\. )?(\ ... | |
>>> t = timeit.Timer('regex.search(string)',setup) | |
>>> t.timeit(10000) | |
6.82871484756 |
Author
caiobegotti
commented
Apr 8, 2012
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment