The program below can take one or more plain text files as input. It works with python2 and python3.
Let's say we have two files that may contain email addresses:
- file_a.txt
foo bar
ok [email protected] sup
[email protected],wyd
hello world!
RESCHEDULE 2'OCLOCK WITH [email protected] FOR TOMORROW@3pm
- file_b.html
<html>
<body>
<ul>
<li><span class=pl-c>Dennis Ideler <[email protected]></span></li>
<li><span class=pl-c>Jane Doe <[email protected]></span></li>
</ul>
</body>
</html>
To extract the email addresses, download the Python program and execute it on the command line with our files as input.
$ python extract_emails_from_text.py file_a.txt file_b.html
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Voila, it prints all found email addresses. Let's also remove the duplicates and sort the email addresses alphabetically.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq
[email protected]
[email protected]
[email protected]
[email protected]
Looks good! Now let's save the results to a file.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq > emails.txt
P.S. The above commands for sorting and deduplicating are specific to shells on a UNIX-based machine (e.g. Linux or Mac). If you're using Windows, you can use PowerShell. For example
python extract_emails_from_text.py file_a.txt file_b.html | sort -unique
if the email starts with a ' like '[email protected]', it does not trim it.