Skip to content

Instantly share code, notes, and snippets.

@MrCheatEugene
Forked from Lewiscowles1986/extract_har.py
Last active November 3, 2023 11:32
Show Gist options
  • Save MrCheatEugene/46ad8173e83efb70cf6543cb36629403 to your computer and use it in GitHub Desktop.
Save MrCheatEugene/46ad8173e83efb70cf6543cb36629403 to your computer and use it in GitHub Desktop.
Python 3 script to extract images from HTTP Archive (HAR) files (Tested & working on Python 3.11)
import json
import base64
import os
# make sure the output directory exists before running!
folder = os.path.join(os.getcwd(), "imgs")
if not os.path.exists(folder):
os.mkdir(folder,777)
ext = {
"image/webp": "webp",
"image/jpeg": "jpg",
"image/png": "png",
"image/svg+xml": "svg"
}
with open("har.har", "r",encoding="utf8") as f:
har = json.loads(f.read())
entries = har["log"]["entries"]
for entry in entries:
mimetype = entry["response"]["content"]["mimeType"]
filename = entry["request"]["url"].split("/")[-1]
if mimetype in ext:
image64 = entry["response"]["content"]["text"]
ext.get(mimetype)
file = os.path.join(folder, f"{filename}")
with open(file, "wb") as f:
f.write(base64.b64decode(image64))
@FurloSK
Copy link

FurloSK commented Nov 3, 2023

Note that this gist will overwrite files with the same filename and different URL paths, since the code is not creating subfolders.

For updated version with subfolders creation and parametrised (specifiable) input file and output folder, see the fork here:
https://gist.github.com/FurloSK/0477e01024f701db42341fc3223a5d8c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment