The program below can take one or more plain text files as input. It works with python2 and python3.
Let's say we have two files that may contain email addresses.
file_a.txt
foo bar
ok [email protected] sup
[email protected],wyd
hello world!
RESCHEDULE 2'OCLOCK WITH [email protected] FOR TOMORROW@3pm
file_b.html
<html>
<body>
<ul>
<li><span class=pl-c>Dennis Ideler <[email protected]></span></li>
<li><span class=pl-c>Jane Doe <[email protected]></span></li>
</ul>
</body>
</html>
To extract the email addresses, download the Python program and execute it on the command line with our files as input.
$ python extract_emails_from_text.py file_a.txt file_b.html
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Voila, it prints all found email addresses. Let's also remove the duplicates and sort the email addresses alphabetically.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq
[email protected]
[email protected]
[email protected]
[email protected]
Looks good! Now let's save the results to a file.
$ python extract_emails_from_text.py file_a.txt file_b.html | sort | uniq > emails.txt
P.S. The above commands for sorting and deduplicating are specific to shells on a UNIX-based machine (e.g. Linux or Mac). If you're using Windows, you can use PowerShell. For example
python extract_emails_from_text.py file_a.txt file_b.html | sort -unique