Skip to content

Instantly share code, notes, and snippets.

@komly
Created October 24, 2015 01:40
Show Gist options
  • Select an option

  • Save komly/3a1f1e1a731c9a9f52b8 to your computer and use it in GitHub Desktop.

Select an option

Save komly/3a1f1e1a731c9a9f52b8 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
import sys
import re
if len(sys.argv) < 3:
print("Usage: %s file_to_read file_to_write" % sys.argv[0])
exit(1)
# Александр Сергеевич Пушкин
# Александр Пушкин
# А. С. Пушкин
REGEXP = r'('+\
r'(?:[А-Я][а-я]+\s+[А-Я][а-я]+\s+[А-Я][а-я]+)'+\
r'|(?:[А-Я][а-я]+\s+[А-Я][а-я]+)'+\
r'|(?:[А-Я]\.\s+[А-Я]\.\s+[А-Я][а-я]+)'+\
r')'
def replace_names(text):
return re.sub(REGEXP, r'<person>\1</person>', text, re.DOTALL | re.MULTILINE)
with open(sys.argv[1], 'rb') as f:
text = f.read().decode('utf-8')
with open(sys.argv[2], 'wb') as o:
o.write(replace_names(text).encode('utf-8'))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment