Skip to content

Instantly share code, notes, and snippets.

@dot-mike
Last active July 18, 2022 00:47
Show Gist options
  • Save dot-mike/eea6e34c1868356e47be1816e5dcd464 to your computer and use it in GitHub Desktop.
Save dot-mike/eea6e34c1868356e47be1816e5dcd464 to your computer and use it in GitHub Desktop.
Script to fetch Yahoo GMD archive from the Archiveteam and
#!/bin/sh
url=$1
basename=${url##*/}
newname=${basename%.*}.mbox
curl -sL $1 |
zcat | grep '^\{' | jq -c 'select(.rawEmail)' | jq -r .rawEmail |
perl -MHTML::Entities -pe '
s/\r//;
decode_entities $_;
/^([-\w]+): \S/;
' > $newname
#!/usr/bin/env python3
import time
import sys
import re
from pathlib import Path
def main(input, output):
file = open(input, "rb+")
s = file.read()
file.close()
fixed = re.sub(
rb"Return-Path:.*",
b"From MAILER-DAEMON " + time.asctime(time.gmtime()).encode(),
s,
)
file = open(output, "wb+")
file.write(fixed)
file.close()
if __name__ == "__main__":
output_file = Path(sys.argv[1]).stem + "-fixed.mbox"
# takes file.mbox as input and outputs file-fixed.mbox
main(sys.argv[1], output_file)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment