Skip to content

Instantly share code, notes, and snippets.

View granawkins's full-sized avatar

Grant granawkins

  • 10:42 (UTC +07:00)
View GitHub Profile
@benwattsjones
benwattsjones / gmail_mbox_parser.py
Last active December 17, 2025 12:11
Quick python code to parse mbox files, specifically those used by GMail. Extracts sender, date, plain text contents etc., ignores base64 attachments.
#! /usr/bin/env python3
# ~*~ utf-8 ~*~
import mailbox
import bs4
def get_html_text(html):
try:
return bs4.BeautifulSoup(html, 'lxml').body.get_text(' ', strip=True)
except AttributeError: # message contents empty