Created
June 22, 2017 22:23
-
-
Save GuiMarthe/b57a4848366121809d72124ec7e8911c to your computer and use it in GitHub Desktop.
A couple of decent regular expressions to find URLs and email addresses
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #A couple of decent regular expressions to find URLs and email addresses | |
| urlRe = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+') | |
| #See http://www.regular-expressions.info/email.html | |
| emailRe = re.compile(r'[a-zA-Z0-9+_\-\.]+@[0-9a-zA-Z][.-0-9a-zA-Z]*\.[a-zA-Z]+') | |
| someString = "Lorem ipsum dolor http://google.com/ consectetuer elit. Aliquam \ | |
| scelerisque felis. Nulla lacinia - info@subdomain.mah.se. Suspendisse elementum \ | |
| lacus. Suspendisse potenti. Etiam et id lorem congue aliquam. Aenean venenatis, \ | |
| elit commodo pretium aliquet, dolor webmaster@yahoo.com, ut iaculis est ante at \ | |
| nisl. Nam auctor. Curabitur id dolor. http://www.last.fm sem fermentum libero." | |
| print urlRe.findall(someString) | |
| print emailRe.findall(someString) | |
| # This will generate the following: | |
| # ['http://google.com/', 'http://www.last.fm'] | |
| # ['info@subdomain.mah.se', 'webmaster@yahoo.com'] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment