Skip to content

Instantly share code, notes, and snippets.

@GuiMarthe
Created June 22, 2017 22:23
Show Gist options
  • Select an option

  • Save GuiMarthe/b57a4848366121809d72124ec7e8911c to your computer and use it in GitHub Desktop.

Select an option

Save GuiMarthe/b57a4848366121809d72124ec7e8911c to your computer and use it in GitHub Desktop.
A couple of decent regular expressions to find URLs and email addresses
#A couple of decent regular expressions to find URLs and email addresses
urlRe = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
#See http://www.regular-expressions.info/email.html
emailRe = re.compile(r'[a-zA-Z0-9+_\-\.]+@[0-9a-zA-Z][.-0-9a-zA-Z]*\.[a-zA-Z]+')
someString = "Lorem ipsum dolor http://google.com/ consectetuer elit. Aliquam \
scelerisque felis. Nulla lacinia - info@subdomain.mah.se. Suspendisse elementum \
lacus. Suspendisse potenti. Etiam et id lorem congue aliquam. Aenean venenatis, \
elit commodo pretium aliquet, dolor webmaster@yahoo.com, ut iaculis est ante at \
nisl. Nam auctor. Curabitur id dolor. http://www.last.fm sem fermentum libero."
print urlRe.findall(someString)
print emailRe.findall(someString)
# This will generate the following:
# ['http://google.com/', 'http://www.last.fm']
# ['info@subdomain.mah.se', 'webmaster@yahoo.com']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment