F1ReplayTiming — Data Sources

Executive Summary

F1ReplayTiming pulls its data from four distinct external sources and uses two storage backends to persist processed data. The primary data source is the FastF1 Python library, which itself wraps the official F1 timing API (livetiming.formula1.com) to supply historical session data — laps, telemetry, weather, race control messages, driver/team metadata, circuit geometry, and event schedules. For live sessions, the app connects directly to the F1 SignalR real-time stream (wss://livetiming.formula1.com/signalrcore) via WebSocket. A photo-based broadcast sync feature uses the OpenRouter AI API (specifically Gemini Flash vision model) to extract leaderboard data from screenshots. Pre-computed session data is stored either on the local filesystem or in Cloudflare R2 (S3-compatible object storage).

Architecture / Data Flow Overview

                           ┌──────────────────────────────────────────┐
                           │         External Data Sources            │
                           ├──────────────┬───────────────┬───────────┤
                           │  FastF1 Lib  │ F1 SignalR WS │ OpenRouter│
                           │  (Ergast +   │ (Live Stream) │ (Vision   │
                           │   F1 API)    │               │  AI API)  │
                           └──────┬───────┴───────┬───────┴─────┬─────┘
                                  │               │             │
                    ┌─────────────▼──────┐  ┌─────▼─────┐  ┌───▼────────────┐
                    │  f1_data.py         │  │ live_     │  │ sync.py        │
                    │  (Data Processing)  │  │ signalr.py│  │ (Photo Sync)   │
                    └────────┬───────────┘  └─────┬─────┘  └───┬────────────┘
                             │                    │             │
                             ▼                    ▼             │
                    ┌────────────────┐   ┌────────────────┐    │
                    │  process.py    │   │ live_state.py  │    │
                    │ (ETL Pipeline) │   │ (State Mgr)    │    │
                    └────────┬───────┘   └────────┬───────┘    │
                             │                    │            │
                             ▼                    ▼            │
                    ┌────────────────┐   ┌────────────────┐   │
                    │  storage.py    │   │  WebSocket to  │   │
                    │  (Local / R2)  │   │  Frontend      │   │
                    └────────┬───────┘   └────────────────┘   │
                             │                                 │
                             ▼                                 ▼
                    ┌────────────────────────────────────────────┐
                    │              Frontend (Next.js)             │
                    │  REST API + WebSocket consumers             │
                    └────────────────────────────────────────────┘

Data Source 1: FastF1 Library (Historical/Replay Data)

Overview

The primary and most substantial data source is the FastF1 open-source Python library (version ≥3.8.1)¹. FastF1 is an unofficial Python library that retrieves Formula 1 timing, telemetry, and session data from the official F1 live timing API and the Ergast API. The project explicitly calls it out as the foundation: "FastF1 is the original inspiration and data source for this project"².

What Data It Provides

All data extraction happens in backend/services/f1_data.py³, which imports and uses FastF1 to load session data. The key data categories are:

Data Type	FastF1 API Call	Data Extracted	Output File
Event schedule	`fastf1.get_event_schedule(year)`	Round numbers, country, event name, location, session dates (UTC)	`seasons/{year}/schedule.json`
Session info	`session.results`	Driver abbreviations, numbers, full names, team names, team colors	`sessions/{year}/{round}/{type}/info.json`
Track geometry	`session.get_circuit_info()`, `fastest_lap.get_telemetry()`	X/Y track outline, corner positions, marshal sectors, rotation, sector boundaries	`sessions/{year}/{round}/{type}/track.json`
Lap data	`session.laps`	Driver, lap number, position, lap time, sector times (S1/S2/S3), tyre compound, tyre life, pit in/out flags	`sessions/{year}/{round}/{type}/laps.json`
Race results	`session.results`	Final positions, grid positions, status (finished/retired), points, team info	`sessions/{year}/{round}/{type}/results.json`
Driver positions (replay frames)	`laps.get_telemetry()` per driver	X/Y GPS coordinates sampled every 0.5s, positions, gaps, intervals, tyre info, pit status, flags, race control messages, weather	`sessions/{year}/{round}/{type}/replay.json`
Telemetry per driver per lap	`lap.get_telemetry()`	Speed, throttle, brake, gear, RPM, DRS, distance	`sessions/{year}/{round}/{type}/telemetry/{ABBR}.json`
Race control messages	`session.race_control_messages`	Steward messages, penalties, investigations, flags, sector-level yellow flags	Embedded in replay frames
Weather data	`session.load(weather=True)`	Air/track temperature, humidity, wind, rainfall	Embedded in replay frames

How FastF1 Is Used

The session loading call in _load_session() requests all four data categories at once⁴:

session = fastf1.get_session(year, round_num, session_type)
session.load(
    telemetry=True,
    laps=True,
    weather=True,
    messages=True,
)

FastF1 uses its own internal caching layer; the app configures a persistent cache directory (FASTF1_CACHE_DIR or .fastf1-cache)⁵ so repeat fetches are fast. An in-memory session cache (_session_cache) also prevents redundant loads within a single process lifetime⁶.

Triggering Data Fetch

Data from FastF1 is fetched in three ways:

On-demand: When a user selects a session not yet processed, ensure_session_data() in process.py triggers the full ETL pipeline⁷.
Bulk pre-compute: The precompute.py CLI script processes sessions ahead of time⁸.
Auto-precompute: A background task (auto_precompute.py) runs every 30 minutes on Fri–Mon, checking the schedule for new sessions and automatically processing them⁹.

Data Source 2: F1 SignalR Real-Time Stream (Live Timing)

Overview

For live session timing during race weekends, the app connects directly to the official Formula 1 SignalR Core endpoint¹⁰:

HTTP negotiate endpoint: https://livetiming.formula1.com/signalrcore/negotiate?negotiateVersion=1
WebSocket endpoint: wss://livetiming.formula1.com/signalrcore

This is the same real-time data feed that powers the official F1 TV and F1 app timing screens.

SignalR Topics Subscribed

The client subscribes to 13 topics covering all aspects of live timing¹¹:

Topic	Data
`TimingData`	Per-driver timing (gaps, intervals, sector times)
`TimingAppData`	Extended timing (stint info, tyre data)
`TimingStats`	Session statistics (personal best laps)
`DriverList`	Driver metadata (number, abbreviation, team color)
`RaceControlMessages`	Steward decisions, penalties, flags
`TrackStatus`	Green/yellow/SC/VSC/red flag status
`WeatherData`	Temperature, humidity, wind, rainfall
`LapCount`	Current lap / total laps
`ExtrapolatedClock`	Session clock (remaining time)
`SessionInfo`	Session metadata
`SessionStatus`	Session lifecycle (started, finished, etc.)
`SessionData`	Additional session data
`Position.z`	GPS car positions (compressed with zlib)

Connection Architecture

The LiveSignalRClient class in live_signalr.py¹² handles:

HTTP negotiation to obtain a connectionToken and AWS load-balancer cookie (AWSALBCORS)¹³
WebSocket connection with SignalR JSON protocol handshake
Topic subscription via a single Subscribe invocation
Decompression of .z topics (base64 + zlib deflate)¹⁴
Handling of multiplexed feed messages containing multiple topic updates¹⁵
Automatic reconnection with exponential backoff (1s → 30s max)¹⁶
Server ping/pong keep-alive handling

State Management

Incoming SignalR messages are incremental deltas. The LiveStateManager in live_state.py¹⁷ accumulates these into a complete session state, maintaining per-driver state objects that track position, gaps, tyres, pit stops, flags, GPS coordinates, and more — producing frames in the same shape as the replay system.

Test Replayer

For development/testing, the LiveTestReplayer (live_test_replayer.py)¹⁸ can replay .jsonStream files downloaded from the F1 static API (livetiming.formula1.com) with original timing, simulating a live session from recorded data files.

Data Source 3: OpenRouter / Gemini Flash Vision API (Photo Sync)

Overview

The broadcast sync feature uses AI vision to extract leaderboard data from screenshots of F1 TV broadcasts. This is powered by the OpenRouter API (https://openrouter.ai/api/v1/chat/completions) using the google/gemini-2.0-flash-001 model¹⁹.

How It Works

User uploads a photo/screenshot of the F1 timing tower
The image is converted to JPEG (handles HEIC, PNG, etc.) and resized to max 1200px²⁰
The image is sent to Gemini Flash via OpenRouter with a detailed extraction prompt²¹
The AI extracts: lap number, gap mode (leader/interval), and per-driver position, abbreviation, gap to leader, and tyre compound
The extracted data is matched against pre-computed replay frames to find the closest timestamp²²

Configuration

Requires an OPENROUTER_API_KEY environment variable. This feature is optional — manual entry of gap times works without it²³.

Data Source 4: Pre-Computed Pit Loss Data

Overview

The compute_pit_loss.py and compute_pit_loss_v2.py scripts²⁴ compute average pit time loss per circuit from previously processed session data. This computed data is itself derived from FastF1 data but becomes a standalone data source once computed:

Average pit loss under green flag conditions
Average pit loss under Safety Car
Average pit loss under Virtual Safety Car

This data feeds the pit position prediction feature, which estimates where a driver would rejoin if they pitted now²⁵.

Storage Backends (Intermediate Data Sources)

Once raw data is fetched from external sources and processed, it is stored in one of two backends. These become the primary data sources for the frontend at runtime — the frontend never talks to FastF1 or the F1 API directly.

Local Filesystem

Default storage backend. JSON files are written to DATA_DIR (default: ./data)²⁶. Data is stored as uncompressed JSON.

Cloudflare R2 (S3-compatible)

Optional remote storage backend, activated by setting STORAGE_MODE=r2²⁷. Uses boto3 with a custom Cloudflare endpoint. Data is stored as gzipped JSON. Requires R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, and R2_SECRET_ACCESS_KEY environment variables²⁸.

The storage.py abstraction layer²⁹ provides a unified API (put_json, get_json, exists, list_keys) that delegates to the configured backend.

Storage Schema

seasons/
  {year}/
    schedule.json              ← event schedule

sessions/
  {year}/
    {round}/
      {session_type}/
        info.json              ← session/driver metadata
        track.json             ← circuit geometry
        laps.json              ← lap-by-lap data
        results.json           ← final results
        replay.json            ← frame-by-frame replay data
        telemetry/
          {DRIVER_ABBR}.json   ← per-driver telemetry

pit_loss.json                  ← precomputed pit loss times

Summary: Data Source Inventory

#	Source	URL / Endpoint	Type	Purpose	Required?
1	FastF1 (wraps F1 Timing API + Ergast)	`api.formula1.com`, `livetiming.formula1.com`	REST / HTTP	Historical session data, telemetry, schedules	Yes (core)
2	F1 SignalR Stream	`wss://livetiming.formula1.com/signalrcore`	WebSocket (SignalR)	Real-time live timing during sessions	For live feature
3	OpenRouter API (Gemini Flash)	`https://openrouter.ai/api/v1/chat/completions`	REST / HTTP	AI vision for photo-based broadcast sync	Optional
4	Local filesystem / Cloudflare R2	Local disk or `{account}.r2.cloudflarestorage.com`	File I/O / S3	Persistent storage for processed data	Yes (one of)

Confidence Assessment

High confidence: All four data sources are clearly documented in the code with explicit URLs, import statements, and API calls. The FastF1 dependency is declared in requirements.txt and used extensively throughout f1_data.py. The SignalR endpoint is hardcoded. The OpenRouter integration is fully visible in sync.py. The storage backends are well-abstracted in storage.py and r2_storage.py.
No ambiguity: There are no hidden or undocumented data sources. The frontend consumes only the backend's REST API and WebSocket endpoints — it has no independent external data fetches.

Footnotes

backend/requirements.txt:3 — fastf1>=3.8.1 ↩
README.md:255 — "FastF1 is the original inspiration and data source for this project" ↩
backend/services/f1_data.py:1-14 — imports and FastF1 cache setup ↩
backend/services/f1_data.py:200-206 — session.load(telemetry=True, laps=True, weather=True, messages=True) ↩
backend/services/f1_data.py:17-28 — FastF1 cache directory configuration ↩
backend/services/f1_data.py:31-32 — _session_cache: dict[str, fastf1.core.Session] = {} ↩
backend/services/process.py:131-178 — ensure_session_data() on-demand processing ↩
backend/precompute.py — CLI bulk pre-compute script ↩
backend/services/auto_precompute.py:1-7 — auto-precompute background task documentation ↩
backend/services/live_signalr.py:36-38 — SignalR URL constants ↩
backend/services/live_signalr.py:42-56 — _TOPICS list ↩
backend/services/live_signalr.py:78-98 — LiveSignalRClient class ↩
backend/services/live_signalr.py:188-242 — _negotiate() method ↩
backend/services/live_signalr.py:400-409 — .z topic decompression ↩
backend/services/live_signalr.py:413-450 — feed message handling ↩
backend/services/live_signalr.py:58-60 — reconnect backoff constants ↩
backend/services/live_state.py:1-7 — LiveStateManager documentation ↩
backend/services/live_test_replayer.py:1-12 — replayer documentation ↩
backend/routers/sync.py:23-24 — OPENROUTER_URL and VISION_MODEL constants ↩
backend/routers/sync.py:60-72 — _convert_to_jpeg() image processing ↩
backend/routers/sync.py:26-57 — EXTRACT_PROMPT for Gemini vision ↩
backend/routers/sync.py:146-235 — _match_frame() matching algorithm ↩
README.md:249-251 — photo sync feature description ↩
backend/compute_pit_loss.py:1-12 — pit loss computation documentation ↩
backend/routers/live.py:58-67 — pit loss data loaded for live sessions ↩
backend/services/storage.py:31-32 — _data_dir() local storage path ↩
backend/services/storage.py:23-24 — _mode() checks STORAGE_MODE env var ↩
backend/services/storage.py:70-76 — R2 credential requirements ↩
backend/services/storage.py:1-9 — storage abstraction layer documentation ↩

ianphil/what-data-sources-does-this-get-it-s-data-from.md

Select an option

No results found

Select an option

No results found

F1ReplayTiming — Data Sources

Executive Summary

Architecture / Data Flow Overview

Data Source 1: FastF1 Library (Historical/Replay Data)

Overview

What Data It Provides

How FastF1 Is Used

Triggering Data Fetch

Data Source 2: F1 SignalR Real-Time Stream (Live Timing)

Overview

SignalR Topics Subscribed

Connection Architecture

State Management

Test Replayer

Data Source 3: OpenRouter / Gemini Flash Vision API (Photo Sync)

Overview

How It Works

Configuration

Data Source 4: Pre-Computed Pit Loss Data

Overview

Storage Backends (Intermediate Data Sources)

Local Filesystem

Cloudflare R2 (S3-compatible)

Storage Schema

Summary: Data Source Inventory

Confidence Assessment

Footnotes