Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save ianphil/d740badd7785dddc0f1d76d24ae5fa75 to your computer and use it in GitHub Desktop.

Select an option

Save ianphil/d740badd7785dddc0f1d76d24ae5fa75 to your computer and use it in GitHub Desktop.
F1ReplayTiming — Data Sources Research

F1ReplayTiming — Data Sources

Executive Summary

F1ReplayTiming pulls its data from four distinct external sources and uses two storage backends to persist processed data. The primary data source is the FastF1 Python library, which itself wraps the official F1 timing API (livetiming.formula1.com) to supply historical session data — laps, telemetry, weather, race control messages, driver/team metadata, circuit geometry, and event schedules. For live sessions, the app connects directly to the F1 SignalR real-time stream (wss://livetiming.formula1.com/signalrcore) via WebSocket. A photo-based broadcast sync feature uses the OpenRouter AI API (specifically Gemini Flash vision model) to extract leaderboard data from screenshots. Pre-computed session data is stored either on the local filesystem or in Cloudflare R2 (S3-compatible object storage).


Architecture / Data Flow Overview

                           ┌──────────────────────────────────────────┐
                           │         External Data Sources            │
                           ├──────────────┬───────────────┬───────────┤
                           │  FastF1 Lib  │ F1 SignalR WS │ OpenRouter│
                           │  (Ergast +   │ (Live Stream) │ (Vision   │
                           │   F1 API)    │               │  AI API)  │
                           └──────┬───────┴───────┬───────┴─────┬─────┘
                                  │               │             │
                    ┌─────────────▼──────┐  ┌─────▼─────┐  ┌───▼────────────┐
                    │  f1_data.py         │  │ live_     │  │ sync.py        │
                    │  (Data Processing)  │  │ signalr.py│  │ (Photo Sync)   │
                    └────────┬───────────┘  └─────┬─────┘  └───┬────────────┘
                             │                    │             │
                             ▼                    ▼             │
                    ┌────────────────┐   ┌────────────────┐    │
                    │  process.py    │   │ live_state.py  │    │
                    │ (ETL Pipeline) │   │ (State Mgr)    │    │
                    └────────┬───────┘   └────────┬───────┘    │
                             │                    │            │
                             ▼                    ▼            │
                    ┌────────────────┐   ┌────────────────┐   │
                    │  storage.py    │   │  WebSocket to  │   │
                    │  (Local / R2)  │   │  Frontend      │   │
                    └────────┬───────┘   └────────────────┘   │
                             │                                 │
                             ▼                                 ▼
                    ┌────────────────────────────────────────────┐
                    │              Frontend (Next.js)             │
                    │  REST API + WebSocket consumers             │
                    └────────────────────────────────────────────┘

Data Source 1: FastF1 Library (Historical/Replay Data)

Overview

The primary and most substantial data source is the FastF1 open-source Python library (version ≥3.8.1)1. FastF1 is an unofficial Python library that retrieves Formula 1 timing, telemetry, and session data from the official F1 live timing API and the Ergast API. The project explicitly calls it out as the foundation: "FastF1 is the original inspiration and data source for this project"2.

What Data It Provides

All data extraction happens in backend/services/f1_data.py3, which imports and uses FastF1 to load session data. The key data categories are:

Data Type FastF1 API Call Data Extracted Output File
Event schedule fastf1.get_event_schedule(year) Round numbers, country, event name, location, session dates (UTC) seasons/{year}/schedule.json
Session info session.results Driver abbreviations, numbers, full names, team names, team colors sessions/{year}/{round}/{type}/info.json
Track geometry session.get_circuit_info(), fastest_lap.get_telemetry() X/Y track outline, corner positions, marshal sectors, rotation, sector boundaries sessions/{year}/{round}/{type}/track.json
Lap data session.laps Driver, lap number, position, lap time, sector times (S1/S2/S3), tyre compound, tyre life, pit in/out flags sessions/{year}/{round}/{type}/laps.json
Race results session.results Final positions, grid positions, status (finished/retired), points, team info sessions/{year}/{round}/{type}/results.json
Driver positions (replay frames) laps.get_telemetry() per driver X/Y GPS coordinates sampled every 0.5s, positions, gaps, intervals, tyre info, pit status, flags, race control messages, weather sessions/{year}/{round}/{type}/replay.json
Telemetry per driver per lap lap.get_telemetry() Speed, throttle, brake, gear, RPM, DRS, distance sessions/{year}/{round}/{type}/telemetry/{ABBR}.json
Race control messages session.race_control_messages Steward messages, penalties, investigations, flags, sector-level yellow flags Embedded in replay frames
Weather data session.load(weather=True) Air/track temperature, humidity, wind, rainfall Embedded in replay frames

How FastF1 Is Used

The session loading call in _load_session() requests all four data categories at once4:

session = fastf1.get_session(year, round_num, session_type)
session.load(
    telemetry=True,
    laps=True,
    weather=True,
    messages=True,
)

FastF1 uses its own internal caching layer; the app configures a persistent cache directory (FASTF1_CACHE_DIR or .fastf1-cache)5 so repeat fetches are fast. An in-memory session cache (_session_cache) also prevents redundant loads within a single process lifetime6.

Triggering Data Fetch

Data from FastF1 is fetched in three ways:

  1. On-demand: When a user selects a session not yet processed, ensure_session_data() in process.py triggers the full ETL pipeline7.
  2. Bulk pre-compute: The precompute.py CLI script processes sessions ahead of time8.
  3. Auto-precompute: A background task (auto_precompute.py) runs every 30 minutes on Fri–Mon, checking the schedule for new sessions and automatically processing them9.

Data Source 2: F1 SignalR Real-Time Stream (Live Timing)

Overview

For live session timing during race weekends, the app connects directly to the official Formula 1 SignalR Core endpoint10:

  • HTTP negotiate endpoint: https://livetiming.formula1.com/signalrcore/negotiate?negotiateVersion=1
  • WebSocket endpoint: wss://livetiming.formula1.com/signalrcore

This is the same real-time data feed that powers the official F1 TV and F1 app timing screens.

SignalR Topics Subscribed

The client subscribes to 13 topics covering all aspects of live timing11:

Topic Data
TimingData Per-driver timing (gaps, intervals, sector times)
TimingAppData Extended timing (stint info, tyre data)
TimingStats Session statistics (personal best laps)
DriverList Driver metadata (number, abbreviation, team color)
RaceControlMessages Steward decisions, penalties, flags
TrackStatus Green/yellow/SC/VSC/red flag status
WeatherData Temperature, humidity, wind, rainfall
LapCount Current lap / total laps
ExtrapolatedClock Session clock (remaining time)
SessionInfo Session metadata
SessionStatus Session lifecycle (started, finished, etc.)
SessionData Additional session data
Position.z GPS car positions (compressed with zlib)

Connection Architecture

The LiveSignalRClient class in live_signalr.py12 handles:

  • HTTP negotiation to obtain a connectionToken and AWS load-balancer cookie (AWSALBCORS)13
  • WebSocket connection with SignalR JSON protocol handshake
  • Topic subscription via a single Subscribe invocation
  • Decompression of .z topics (base64 + zlib deflate)14
  • Handling of multiplexed feed messages containing multiple topic updates15
  • Automatic reconnection with exponential backoff (1s → 30s max)16
  • Server ping/pong keep-alive handling

State Management

Incoming SignalR messages are incremental deltas. The LiveStateManager in live_state.py17 accumulates these into a complete session state, maintaining per-driver state objects that track position, gaps, tyres, pit stops, flags, GPS coordinates, and more — producing frames in the same shape as the replay system.

Test Replayer

For development/testing, the LiveTestReplayer (live_test_replayer.py)18 can replay .jsonStream files downloaded from the F1 static API (livetiming.formula1.com) with original timing, simulating a live session from recorded data files.


Data Source 3: OpenRouter / Gemini Flash Vision API (Photo Sync)

Overview

The broadcast sync feature uses AI vision to extract leaderboard data from screenshots of F1 TV broadcasts. This is powered by the OpenRouter API (https://openrouter.ai/api/v1/chat/completions) using the google/gemini-2.0-flash-001 model19.

How It Works

  1. User uploads a photo/screenshot of the F1 timing tower
  2. The image is converted to JPEG (handles HEIC, PNG, etc.) and resized to max 1200px20
  3. The image is sent to Gemini Flash via OpenRouter with a detailed extraction prompt21
  4. The AI extracts: lap number, gap mode (leader/interval), and per-driver position, abbreviation, gap to leader, and tyre compound
  5. The extracted data is matched against pre-computed replay frames to find the closest timestamp22

Configuration

Requires an OPENROUTER_API_KEY environment variable. This feature is optional — manual entry of gap times works without it23.


Data Source 4: Pre-Computed Pit Loss Data

Overview

The compute_pit_loss.py and compute_pit_loss_v2.py scripts24 compute average pit time loss per circuit from previously processed session data. This computed data is itself derived from FastF1 data but becomes a standalone data source once computed:

  • Average pit loss under green flag conditions
  • Average pit loss under Safety Car
  • Average pit loss under Virtual Safety Car

This data feeds the pit position prediction feature, which estimates where a driver would rejoin if they pitted now25.


Storage Backends (Intermediate Data Sources)

Once raw data is fetched from external sources and processed, it is stored in one of two backends. These become the primary data sources for the frontend at runtime — the frontend never talks to FastF1 or the F1 API directly.

Local Filesystem

Default storage backend. JSON files are written to DATA_DIR (default: ./data)26. Data is stored as uncompressed JSON.

Cloudflare R2 (S3-compatible)

Optional remote storage backend, activated by setting STORAGE_MODE=r227. Uses boto3 with a custom Cloudflare endpoint. Data is stored as gzipped JSON. Requires R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, and R2_SECRET_ACCESS_KEY environment variables28.

The storage.py abstraction layer29 provides a unified API (put_json, get_json, exists, list_keys) that delegates to the configured backend.

Storage Schema

seasons/
  {year}/
    schedule.json              ← event schedule

sessions/
  {year}/
    {round}/
      {session_type}/
        info.json              ← session/driver metadata
        track.json             ← circuit geometry
        laps.json              ← lap-by-lap data
        results.json           ← final results
        replay.json            ← frame-by-frame replay data
        telemetry/
          {DRIVER_ABBR}.json   ← per-driver telemetry

pit_loss.json                  ← precomputed pit loss times

Summary: Data Source Inventory

# Source URL / Endpoint Type Purpose Required?
1 FastF1 (wraps F1 Timing API + Ergast) api.formula1.com, livetiming.formula1.com REST / HTTP Historical session data, telemetry, schedules Yes (core)
2 F1 SignalR Stream wss://livetiming.formula1.com/signalrcore WebSocket (SignalR) Real-time live timing during sessions For live feature
3 OpenRouter API (Gemini Flash) https://openrouter.ai/api/v1/chat/completions REST / HTTP AI vision for photo-based broadcast sync Optional
4 Local filesystem / Cloudflare R2 Local disk or {account}.r2.cloudflarestorage.com File I/O / S3 Persistent storage for processed data Yes (one of)

Confidence Assessment

  • High confidence: All four data sources are clearly documented in the code with explicit URLs, import statements, and API calls. The FastF1 dependency is declared in requirements.txt and used extensively throughout f1_data.py. The SignalR endpoint is hardcoded. The OpenRouter integration is fully visible in sync.py. The storage backends are well-abstracted in storage.py and r2_storage.py.
  • No ambiguity: There are no hidden or undocumented data sources. The frontend consumes only the backend's REST API and WebSocket endpoints — it has no independent external data fetches.

Footnotes

Footnotes

  1. backend/requirements.txt:3fastf1>=3.8.1

  2. README.md:255 — "FastF1 is the original inspiration and data source for this project"

  3. backend/services/f1_data.py:1-14 — imports and FastF1 cache setup

  4. backend/services/f1_data.py:200-206session.load(telemetry=True, laps=True, weather=True, messages=True)

  5. backend/services/f1_data.py:17-28 — FastF1 cache directory configuration

  6. backend/services/f1_data.py:31-32_session_cache: dict[str, fastf1.core.Session] = {}

  7. backend/services/process.py:131-178ensure_session_data() on-demand processing

  8. backend/precompute.py — CLI bulk pre-compute script

  9. backend/services/auto_precompute.py:1-7 — auto-precompute background task documentation

  10. backend/services/live_signalr.py:36-38 — SignalR URL constants

  11. backend/services/live_signalr.py:42-56_TOPICS list

  12. backend/services/live_signalr.py:78-98LiveSignalRClient class

  13. backend/services/live_signalr.py:188-242_negotiate() method

  14. backend/services/live_signalr.py:400-409.z topic decompression

  15. backend/services/live_signalr.py:413-450feed message handling

  16. backend/services/live_signalr.py:58-60 — reconnect backoff constants

  17. backend/services/live_state.py:1-7 — LiveStateManager documentation

  18. backend/services/live_test_replayer.py:1-12 — replayer documentation

  19. backend/routers/sync.py:23-24OPENROUTER_URL and VISION_MODEL constants

  20. backend/routers/sync.py:60-72_convert_to_jpeg() image processing

  21. backend/routers/sync.py:26-57EXTRACT_PROMPT for Gemini vision

  22. backend/routers/sync.py:146-235_match_frame() matching algorithm

  23. README.md:249-251 — photo sync feature description

  24. backend/compute_pit_loss.py:1-12 — pit loss computation documentation

  25. backend/routers/live.py:58-67 — pit loss data loaded for live sessions

  26. backend/services/storage.py:31-32_data_dir() local storage path

  27. backend/services/storage.py:23-24_mode() checks STORAGE_MODE env var

  28. backend/services/storage.py:70-76 — R2 credential requirements

  29. backend/services/storage.py:1-9 — storage abstraction layer documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment