We’re seeing repeated CI failures in a fresh container when tests make live HTTP calls. Example from today, 5 or so rerun failures on:
tests/models/pix2struct/test_image_processing_pix2struct.py::Pix2StructImageProcessingTest::test_expected_patchestests/models/pix2struct/test_image_processing_pix2struct.py::Pix2StructImageProcessingTest::test_call_vqa
Typical errors:
HTTPError('429 Client Error: Too Many Requests for url: https://huggingface.co/ybelkada/fonts/resolve/main/Arial.TTF')PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x...>when doingImage.open(requests.get(..., stream=True).raw)forhttps://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/australia.jpg.
We want fully deterministic, offline CI runs without touching library/test code.
Adopt a record → bake → replay workflow:
- Record HTTP requests (one-off or on demand): execute the tests once with network access, while transparently recording all HTTP traffic to a local cache.
- Bake caches into the Docker image: copy the recorded caches into the image as a layer.
- Replay offline in CI: tests read exclusively from the baked caches. No network calls, no flakes.
This covers both kinds of traffic we have:
- Hugging Face Hub traffic (fonts, model/dataset files): relies on the hub’s own on-disk cache directories.
- Arbitrary
requests.get(...)traffic (e.g. direct image URLs): captured and replayed by a global HTTP cache installed viasitecustomize.py.
We use both of the following because our tests rely on Hub downloads and direct requests.get(...) calls:
- Python auto-imports
sitecustomizeat startup if present on the path. - We install a global
requests-cachethat transparently caches and replays anyrequeststraffic (e.g., direct image URLs used by PIL). - We also force body materialization so
stream=Trueresponses get fully cached, avoiding truncated content errors.
-
We set cache env vars so Hub blobs (fonts, small images, model shards) land in known directories we can bake into the image:
HF_HOME=/opt/hf_cacheHF_HUB_CACHE=/opt/hf_cache/hubTRANSFORMERS_CACHE=/opt/hf_cache/transformersHF_DATASETS_CACHE=/opt/hf_cache/datasets
Place this file into the image at site-packages/sitecustomize.py so it auto-loads:
# sitecustomize.py
import os, json, time, atexit
try:
import requests, requests_cache
except Exception:
requests = None; requests_cache = None
if requests and requests_cache:
cache_dir = os.environ.get("HTTP_CACHE_DIR", "/opt/http_cache")
os.makedirs(cache_dir, exist_ok=True)
cache_path = os.path.join(cache_dir, "requests_cache")
log_path = os.path.join(cache_dir, "http_log.ndjson")
# Never expire; regenerate the cache explicitly when needed
requests_cache.install_cache(cache_path, backend="sqlite", expire_after=None)
def _log_and_materialize(r, *_, **__):
# Ensure streamed responses are fully cached; log basic metadata
try:
_ = r.content
except Exception:
pass
try:
with open(log_path, "a") as f:
f.write(json.dumps({
"ts": time.time(),
"method": getattr(r.request, "method", None),
"url": r.url,
"status": getattr(r, "status_code", None),
"from_cache": getattr(r, "from_cache", False),
}) + "
")
except Exception:
pass
try:
sess = requests_cache.get_session()
sess.hooks.setdefault("response", []).append(_log_and_materialize)
except Exception:
pass
@atexit.register
def _print_log_loc():
print(f"[http-recorder] Log: {log_path} Cache: {cache_path}.sqlite")Run the test suite once with network access to populate both the Hub caches and the requests-cache DB. Since slow tests are skipped by default, this will pull only small assets.
# Choose stable cache locations
export HF_HOME=/opt/hf_cache
export HF_HUB_CACHE=/opt/hf_cache/hub
export TRANSFORMERS_CACHE=/opt/hf_cache/transformers
export HF_DATASETS_CACHE=/opt/hf_cache/datasets
export HTTP_CACHE_DIR=/opt/http_cache
# Optional: faster first pull
export HF_HUB_ENABLE_HF_TRANSFER=1
# Run tests to record everything needed
pytest -q
# Package caches as build artifacts
tar -C /opt -czf hf_cache.tgz hf_cache
tar -C /opt -czf http_cache.tgz http_cacheAdd the recorded caches as layers and include sitecustomize.py so runtime replays from cache automatically.
FROM python:3.11-slim
# Cache locations (must match recording step)
ENV HF_HOME=/opt/hf_cache \
HF_HUB_CACHE=/opt/hf_cache/hub \
TRANSFORMERS_CACHE=/opt/hf_cache/transformers \
HF_DATASETS_CACHE=/opt/hf_cache/datasets \
HTTP_CACHE_DIR=/opt/http_cache
# Minimal deps used by tests
RUN pip install --no-cache-dir requests requests-cache huggingface_hub pillow
# Auto-loads and patches requests globally
COPY sitecustomize.py /usr/local/lib/python3.11/site-packages/sitecustomize.py
# Bring in recorded caches (created in 3.2)
ADD hf_cache.tgz /opt/
ADD http_cache.tgz /opt/
# Ensure readability for non-root users in CI
RUN chmod -R a+rX /opt/hf_cache /opt/http_cache
WORKDIR /workspace
# COPY your repo here in the real DockerfileNo special flags are required. Keep CI online; the runtime will prefer:
- Hub blobs from
HF_*_CACHEfor models/datasets/fonts/images. requests-cachefor any directrequests.get(...)(e.g., theaustralia.jpgURL), served from the baked SQLite.
If a new URL appears in tests, it will be fetched live during CI; consider periodically re-running step 3.2 to refresh caches.
- This design eliminates the observed flakes by ensuring both Hub assets and direct HTTP resources are already cached. Remaining network calls are rare metadata checks and new URLs.
- Because slow tests are disabled by default, the baked caches should remain small (fonts, tiny images, small model shards).
- If future flakes reappear due to new URLs, just regenerate caches (3.2) and rebuild the image.