-
-
Save vrushank-snippets/2169603 to your computer and use it in GitHub Desktop.
function extract_emails($str){ | |
// This regular expression extracts all emails from a string: | |
$regexp = '/([a-z0-9_\.\-])+\@(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i'; | |
preg_match_all($regexp, $str, $m); | |
return isset($m[0]) ? $m[0] : array(); | |
} | |
$test_string = 'This is a test string... | |
[email protected] | |
Test different formats: | |
[email protected]; | |
<a href="[email protected]">foobar</a> | |
<[email protected]> | |
strange formats: | |
[email protected] | |
test6[at]example.org | |
[email protected] | |
test8@ example.org | |
test9@!foo!.org | |
foobar | |
'; | |
print_r(extract_emails($test_string)); |
I've written something a bit more advanced for Node.js
Thanks!
Including: test6[at]example.org
$regexp = '/([a-z0-9_\.\-])+(\@|\[at\])+(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i';
Just found this through a related Google search. At some point in time, people should stop propagating this fucked up stoneage-regexp.
Since 2001 we've had a few additional gTLDs. Amongst them are .museum
and .travel
, which don't fit this stoneage regex. Quite a few site even use the 1980s format of ([a-z0-9]{2,4})+
to parse the TLD part whichs keeps annoying me with my .name
email address.
Nowadays (since 2013, so for over 10 years), anybody can register any TLD they want, so {2,4} ist just fucked up.
Also, maybe people should realize, that there are more characters in the world than the limited supply of 26 ones known by US and UK citizens. Here is some information for anybody who slept though 25 years of internet development:
IDN-Domains
(https://en.wikipedia.org/wiki/Internationalized_domain_name)New gTLDs
(https://newgtlds.icann.org/)
Including: test6[at]example.org
$regexp = '/([a-z0-9_\.\-])+(\@|\[at\])+(([a-z0-9\-])+\.)+([a-z0-9]{2,4})+/i';