A Python tool to extract and export Gmail emails to JSONL format using the Gmail API.
- Install dependencies:
pip install -r requirements.in
Run the script:
python gmail_extractor.py
The tool will prompt you to enter a search query. You can use Gmail's search syntax to filter emails.
On first run, the tool will open a web browser for Gmail authentication. Credentials are saved locally in token.pickle
for subsequent runs.
label:important
This will find all emails with the "important" label. Note that the count shown is per email, not per thread.
from:[email protected]
- Emails from specific sendersubject:invoice
- Emails with "invoice" in subjecthas:attachment
- Emails with attachmentsafter:2024/01/01 before:2024/12/31
- Emails within date rangeis:unread
- Unread emails (default)
The tool exports emails to a JSONL file named gmail_export_YYYYMMDD_HHMMSS.jsonl
containing:
- Email ID and Thread ID
- Date, From, To, Subject
- Email snippet and full body text
Each email is saved as a separate JSON object on its own line.
Already done. The creds file is in this folder.
The process to generate it was...
Set up Gmail API credentials:
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable Gmail API
- Create OAuth 2.0 credentials
- Download the credentials JSON file and place it in the project directory