Created
March 23, 2012 10:59
-
-
Save vrushank-snippets/2169603 to your computer and use it in GitHub Desktop.
PHP : Extract emails from a string
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function extract_emails($str){ | |
// This regular expression extracts all emails from a string: | |
$regexp = '/([a-z0-9_\.\-])+\@(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i'; | |
preg_match_all($regexp, $str, $m); | |
return isset($m[0]) ? $m[0] : array(); | |
} | |
$test_string = 'This is a test string... | |
[email protected] | |
Test different formats: | |
[email protected]; | |
<a href="[email protected]">foobar</a> | |
<[email protected]> | |
strange formats: | |
[email protected] | |
test6[at]example.org | |
[email protected] | |
test8@ example.org | |
test9@!foo!.org | |
foobar | |
'; | |
print_r(extract_emails($test_string)); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Just found this through a related Google search. At some point in time, people should stop propagating this fucked up stoneage-regexp.
Since 2001 we've had a few additional gTLDs. Amongst them are
.museum
and.travel
, which don't fit this stoneage regex. Quite a few site even use the 1980s format of([a-z0-9]{2,4})+
to parse the TLD part whichs keeps annoying me with my.name
email address.Nowadays (since 2013, so for over 10 years), anybody can register any TLD they want, so {2,4} ist just fucked up.
Also, maybe people should realize, that there are more characters in the world than the limited supply of 26 ones known by US and UK citizens. Here is some information for anybody who slept though 25 years of internet development:
IDN-Domains
(https://en.wikipedia.org/wiki/Internationalized_domain_name)New gTLDs
(https://newgtlds.icann.org/)