Skip to content

Instantly share code, notes, and snippets.

@ameerkat
Created August 10, 2011 09:54
Show Gist options
  • Select an option

  • Save ameerkat/1136481 to your computer and use it in GitHub Desktop.

Select an option

Save ameerkat/1136481 to your computer and use it in GitHub Desktop.
Regex for parsing out information from the IDMB list files
name = re.compile("""
('.+')?\s* # nickname (optional, group 1)
(([^,']*),)?\s* # last name (optional, group 3)
([^\(]+) # first name (required, group 4)
(\((\w+)\))? # actor number (optional, group 6)
""", re.VERBOSE)
acted_in = re.compile("""
"?([^"]*?)"?\s # title (required, group 1) surrounded by quotations if it's a tv show
\(((\d+)/?(\w+)?).*?\) # the year (required, group 3), followed by `/ROMAN_NUMERAL`
# (optional, group 4) if multiple in same year
(\s*\((T?VG?)\))? # special code (optional, group 6), one of 'TV', 'V', 'VG'
(\s*\((\w*)\))? # information regarding part (optional, group 8), e.g. 'voice', 'likeness'
(\s*\{([^s\(]*?)(\s*\(\#(\d+)\.(\d+)\))?\})?
# episode information: episode title (optional, group 10), within that
# episode series (optional, group 12) and episode number
# (optional, group 13) information. The episode series and number are
# optional within the optional group.
(\s*\[(.*)\])? # character name (optional, group 15) (surrounded by '[' and ']')
(\s*\<(\d+)\>)? # billing position (optional, group 17) (surrounded by '<' and '>')
""", re.VERBOSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment