A Python script to extract webcam images from WACZ (Web Archive Collection Zipped) files.
Extracts Belarus customs webcam images (OSH1.jpg, OSH2.jpg, OSH3.jpg, OSH4.jpg) from archived web data collected between August 5-27, 2025.
pip install warciopython extract_webcam.py- Searches all
.waczfiles in thedata/directory - Extracts images matching pattern
/webcam/OSH*.jpg - Saves images to
extracted_webcam/directory - Names files as
{timestamp}_{image_name}.jpg
The script processes WACZ files and reports progress:
Found 377 WACZ files
✓ filename1: extracted 4 images
✓ filename2: extracted 0 images
...
Extracted 1112 total images to extracted_webcam
From 377 WACZ files (40GB total):
- 1,112 webcam images extracted
- 278 timestamps with webcam data
- Date range: August 5-27, 2025
- Source:
customs.gov.by/webcam/monitoring
- Only processes files containing Belarus customs webcam URLs
- Validates JPEG format with magic byte checking
- Handles compressed WARC files within WACZ archives
- Skips files without relevant webcam content