Last active
March 6, 2017 01:08
-
-
Save kangwonlee/eb1f086628f0555718b6b00a7a02cfc5 to your computer and use it in GitHub Desktop.
CV (Curriculum Vitae) conference paper entry parser in regular expression
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This is an attempt to automate extracting some fields from an entry of Curriculum Vitae on a conference publication. | |
For more about regular expression, please refer to https://docs.python.org/library/re.html. | |
""" | |
import re | |
def get_parser_cv_conf_paper(): | |
return re.compile( | |
r'(?P<paper_number>\(\d+\))[.,]?\s*(?P<authors>.+?)\s*[,:;]?\s*["“](?P<paper_title>.+?)["“”],?\s*(?P<conference_info>.+)') |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment