This script extracts all emails from an Outlook PST archive and saves them into some output folder as individual RFC822 compliant *.eml files.
Installing the external dependency pypff may not be straight forward (it wasn't for me). I forked the original repository to make it work in Python 3. If you get errors, check their wiki pages for help or try my fork. Below are the steps that worked for me:
Clone https://github.com/libyal/libpff/tree/master/pypff
cd libpff/
./synclibs.sh
./autogen.sh
./configure --enable-python
make
sudo make install
python setup.py build
sudo python setup.py install
Now that everything is installed, you can execute the script as follows:
python pst2eml.py /path/to/archive.pst /path/to/output/dir
Optionally, you can write the log into a file by adding --logfile=/path/to/log_dir
to the command.
Full disclaimer: I was inspired by this script, but as you may see, I pretty much threw everything overboard and made my own thing. Only kept the logging and argparse really.
Next error: Some mails are missing specific headers, e.g. an Exchange appointment invitation mail apparantly misses
message.transport_headers
. Or an e-mail draft was missingmessage.sender_name
.Also there are a lot of item, that are missing a message body (vCards and stuff).
Therefore I've changed the process_message function as follows:
In process_folder function I've changed as follows to discard non mail items without body: