Skip to content

Instantly share code, notes, and snippets.

@enachb
Last active March 9, 2026 22:49
Show Gist options
  • Select an option

  • Save enachb/2cff00871427f3013c880784f3cfcbb5 to your computer and use it in GitHub Desktop.

Select an option

Save enachb/2cff00871427f3013c880784f3cfcbb5 to your computer and use it in GitHub Desktop.
WebRTC Historical Video Playback — Design Document

WebRTC Historical Video Playback — Design Document

Serve historical video from video-service to browsers via WebRTC. Pure Go implementation using pion/webrtc. No CGO, no external processes.

Date: 2026-03-09 Status: Draft — reviewed by Gemini, updated with findings


1. Problem Statement

The video-service stores historical H.264 video as SVS1 segments in S3. Today, retrieving video requires either:

  • gRPC GetSegmentFrames + client-side reconstruction (no browser support)
  • GetImage for single JPEG frames (spawns FFmpeg per request, doesn't scale)
  • A download CLI that pipes to FFmpeg to produce MP4 files (offline only)

Browsers cannot consume any of these. The UI expects an HTTP video endpoint that doesn't exist. There is no low-latency seeking, no live-to-historical unification path, and every frame extraction spawns an FFmpeg process.

Goal: Stream stored H.264 NALUs directly to browsers via WebRTC — zero transcoding, sub-second seek, hardware-accelerated decoding on the client.

Codec Scope

The ingestion pipeline also supports H.265 (HEVC), but most browsers do not support HEVC over WebRTC. This design targets H.264 only. On session creation, the server inspects the first segment's initialization data. If the stream contains HEVC NALUs (NAL type >= 32), the server returns 422 Unprocessable Entity with {"error": "HEVC streams not supported over WebRTC"}. HEVC transcoding is a future extension, not part of this design.


2. Design Overview

sequenceDiagram
    participant B as Browser
    participant S as video-service
    participant S3 as S3 / GCS

    B->>S: POST /webrtc/session {stream_id, start_ts, mode, sdp_offer}
    S->>S: Create PeerConnection + H.264 track + data channel
    S-->>B: 201 {session_id, sdp_answer}

    B->>S: PATCH /webrtc/session/{id}/ice {candidates}
    S-->>B: 204

    Note over B,S: ICE connectivity established

    S->>S3: Fetch segment containing start_ts
    S3-->>S: SVS1 blob
    S->>S: Parse index, extract NALUs from keyframe <= start_ts
    S->>B: RTP packets (H.264 NALUs via WebRTC track)
    S->>B: Data channel: {"evt":"playing","timestamp_ms":...}

    B->>S: Data channel: {"cmd":"seek","timestamp_ms":...}
    S->>S3: Fetch segment for new timestamp
    S->>B: RTP packets from new position
    S->>B: Data channel: {"evt":"playing","timestamp_ms":...}

    B->>S: Data channel: {"cmd":"pause"}
    S->>B: Data channel: {"evt":"paused"}

    Note over B,S: On disconnect or DELETE
    B->>S: DELETE /webrtc/session/{id}
    S->>S: Cleanup PeerConnection, goroutines
Loading

3. Playback Modes

Two modes govern how segments are played and how time gaps are handled.

3.1 Clip Mode (mode: "clip")

Play a bounded time window. The server streams frames from start_ts_ms to end_ts_ms (or end of the containing segment if end_ts_ms is omitted).

Behavior:

  • Playback begins at the keyframe <= start_ts_ms within the first matching segment.
  • Frames are sent in order until end_ts_ms is reached or the segment ends.
  • If multiple segments fall within the time range, they are played sequentially.
  • Gaps between segments are skipped silently — the next segment starts immediately after the previous one ends. A gap event is emitted so the browser can show elapsed real time, but playback does not pause.
  • When the last frame within the range is sent, the server emits ended.

Use case: Reviewing a specific motion event or time window.

3.2 Continuous Mode (mode: "continuous")

Play forward from start_ts_ms through all available segments indefinitely (or until end_ts_ms if provided).

Behavior:

  • Playback begins at the keyframe <= start_ts_ms.
  • When a segment ends, the server queries the DB for the next segment (ts_start_ms > current_segment.ts_end_ms, ordered ascending, limit 1).
  • If there is a gap: The server emits a gap event with the gap duration and the next segment's start time, then pauses the RTP clock for a configurable hold period (default: 0 — no hold, jump immediately). After the hold, playback resumes from the next segment.
  • If there is no next segment: The server emits waiting and polls for new segments every 2 seconds. When one appears, playback resumes.
  • Near-live latency caveat: The ingestion pipeline flushes chunks to S3 every ~60 seconds (on keyframe boundaries). This means continuous-mode tailing has an inherent ~60-second delay behind the live camera feed. True low-latency live streaming requires a separate mode: "live" path that subscribes to NATS directly (see Future Extensions).
  • When end_ts_ms is reached, the server emits ended.
  • If no end_ts_ms is set, playback continues until the browser disconnects or sends stop.

Use case: Watching a full recording session, or tailing a live recording.

3.3 Time Range Parameters

Parameter Required Default Description
stream_id Yes UUID of the video stream
start_ts_ms Yes Playback start (epoch ms)
end_ts_ms No None Playback end (epoch ms). Omit for open-ended.
mode No continuous "clip" or "continuous"

4. Signaling API (HTTP)

Added to the video-service's HTTP listener alongside any future REST endpoints.

4.1 Create Session

POST /webrtc/session
Content-Type: application/json

{
  "stream_id":    "uuid-string",
  "start_ts_ms":  1709942400000,
  "end_ts_ms":    1709942460000,   // optional
  "mode":         "clip",           // "clip" | "continuous", default "continuous"
  "sdp_offer":    "v=0\r\n..."
}

Response (201 Created):

{
  "session_id":  "uuid-string",
  "sdp_answer":  "v=0\r\n..."
}

Errors:

  • 400 — Invalid SDP, missing stream_id, start_ts after end_ts
  • 404 — stream_id not found or no segments in range
  • 422 — Stream contains HEVC (unsupported codec for WebRTC)
  • 429 — Max concurrent sessions reached
  • 403 — RLS: principal lacks access to this stream

Authentication: Same x-sky-principal header used by existing gRPC endpoints. The signaling handler extracts the principal and enforces RLS on segment queries.

4.2 Trickle ICE

PATCH /webrtc/session/{session_id}/ice
Content-Type: application/json

{
  "candidate": "candidate:842163049 1 udp ...",
  "sdp_mid":   "0",
  "sdp_mline_index": 0
}

Response: 204 No Content

4.3 Teardown

DELETE /webrtc/session/{session_id}

Response: 204 No Content

Idempotent. Also triggered automatically by ICE disconnection or idle timeout.


5. Media Pipeline

5.1 Zero-Transcode Path

The stored H.264 NALUs are sent directly to the browser's hardware decoder. No FFmpeg, no re-encoding, no CGO.

S3 blob (SVS1)
  → SegmentReader: parse index, read [uint32 len][VideoFrameSpec protobuf] entries
  → FrameExtractor: unmarshal pb.VideoFrameSpec → raw H.264 frame bytes
  → NALUExtractor: strip Annex-B start codes → bare NALUs
  → pion H264Packetizer: NALUs → RTP packets
  → pion TrackLocalStaticRTP → SRTP → DTLS → ICE → Browser
  → Browser RTCPeerConnection → MediaStream → <video> element
  → Browser hardware H.264 decoder

SVS1 frame format note: SVS1 frames are NOT raw H.264 bytes. Each frame is a length-prefixed marshaled pb.VideoFrameSpec protobuf message. The SegmentReader must read the 4-byte uint32 length prefix, read that many bytes, unmarshal the VideoFrameSpec, then extract the Data field which contains the actual H.264 Annex-B frame bytes.

5.2 SegmentReader

Wraps existing S3 fetch + SVS1 parsing logic. Yields frames as an iterator:

type FrameData struct {
    TimestampMs uint64
    NALUs       [][]byte  // Bare NALUs (no start codes)
    IsKeyFrame  bool
}

type SegmentReader interface {
    // ReadFrom returns frames from the keyframe <= startTs to endTs.
    // If endTs is 0, reads to end of segment.
    ReadFrom(ctx context.Context, segmentPath string, startTs, endTs uint64) iter.Seq2[FrameData, error]
}

Annex-B stripping: Use h264.AnnexB.Unmarshal() from github.com/bluenviron/mediacommon/v2/pkg/codecs/h264 (already a transitive dependency via gortsplib). No hand-rolled byte parsing. The same library provides IsRandomAccess() for keyframe detection, SPS.Unmarshal() for profile extraction, and NAL type constants. HEVC detection uses the companion h265 package.

5.3 Pacer

Controls the rate at which frames are written to the WebRTC track.

type Pacer struct {
    track     *webrtc.TrackLocalStaticRTP
    speed     float64        // 1.0 = realtime, 2.0 = 2x, etc.
    rtpTS     uint32         // Current RTP timestamp (90kHz clock)
    ticker    *time.Timer
    pauseCh   chan struct{}
    resumeCh  chan struct{}
}

Timing logic:

  • Compute inter-frame delay: (frame[n+1].TimestampMs - frame[n].TimestampMs)
  • Adjust for speed: delay = delay / speed
  • Sleep for adjusted delay between frames
  • RTP timestamp increments: delta_ms * 90 (90kHz clock)

SPS/PPS injection:

  • Sent before the first frame of every segment.
  • Re-sent on every seek (decoder needs re-initialization).
  • Read from the initialization_segment field of the pb.SegmentIndex stored inside the S3 blob header (NOT from an Ent schema field — VideoSegment does not have this field). The server must fetch the blob header (first ~8KB) from S3 to parse the SegmentIndex protobuf and extract initialization_segment.
  • Missing SPS/PPS fallback: If the segment's SegmentIndex has no initialization_segment, the server scans the segment's first keyframe for NAL types 7 (SPS) and 8 (PPS). If still not found, it walks backwards through previous segments in the same stream (max 10 segments to cap S3 calls) until SPS/PPS are located. If none exist anywhere, the session emits error and tears down.

SDP codec negotiation:

  • The server sets profile-level-id in the H.264 RTP codec parameters to match the stored stream's actual profile. This is extracted from byte 1-3 of the SPS NAL. If unknown, defaults to Constrained Baseline (42e01f) which all browsers support.

RTP timestamp overflow:

  • The 90kHz RTP clock wraps uint32 every ~13.25 hours. The pacer tracks cumulative playback time independently and lets the uint32 wrap naturally. pion/webrtc and browser jitter buffers handle uint32 wrapping correctly as long as consecutive timestamps are monotonically increasing modulo 2^32.

5.4 Segment Prefetching

[Playing segment N] ──── 80% through ────→ [Prefetch segment N+1 from S3]
                                                      │
[Segment N ends] ─────────────────────────→ [Start from prefetched buffer]

A background goroutine fetches the next segment when playback reaches 80% of the current segment's duration. The fetched blob is held in memory (segments are ~60s of H.264, typically 2-10 MB). If the prefetch fails, it retries. If the next segment doesn't exist yet (continuous mode), it polls.


6. Data Channel Protocol

A data channel named "control" carries JSON messages for playback control.

6.1 Browser → Server Commands

{"cmd": "seek",    "timestamp_ms": 1709942430000}
{"cmd": "pause"}
{"cmd": "resume"}
{"cmd": "speed",   "rate": 2.0}
{"cmd": "stop"}
Command Behavior
seek Pacer stops. Any in-flight S3 fetch is cancelled via context. Server finds segment + keyframe for timestamp. SPS/PPS re-sent. Pacer resumes from new position. Rapid seeks are coalesced — only the last seek within a 100ms window is executed.
pause Pacer suspends. Track stays open. No RTP sent.
resume Pacer resumes from where it paused.
speed Pacer adjusts inter-frame delay. Accepted range: 0.25 to 16.0.
stop Server emits ended, tears down session.

6.2 Server → Browser Events

{"evt": "playing",   "timestamp_ms": 1709942400000}
{"evt": "paused",    "timestamp_ms": 1709942415000}
{"evt": "buffering"}
{"evt": "seeking",   "timestamp_ms": 1709942430000}
{"evt": "seeked",    "timestamp_ms": 1709942428000}
{"evt": "gap",       "from_ms": 1709942460000, "to_ms": 1709942520000, "duration_ms": 60000}
{"evt": "waiting"}
{"evt": "ended"}
{"evt": "error",     "message": "segment not found"}
{"evt": "position",  "timestamp_ms": 1709942430000}
Event When
playing Playback started or resumed
paused Pacer suspended
buffering Fetching segment from S3, track temporarily idle
seeking Seek command received, looking up segment
seeked Seek complete, includes actual position (keyframe-aligned)
gap Time gap detected between segments. In clip mode: skipped. In continuous mode: skipped after hold period.
waiting Continuous mode: no next segment yet, polling.
ended Reached end_ts, end of last segment (clip mode), or stop command.
error Unrecoverable error, session will be torn down.
position Periodic timestamp update. Sent every 1 second of playback.

7. Session Management

7.1 Session Struct

type Session struct {
    ID            string
    StreamID      uuid.UUID
    StartTsMs     uint64
    EndTsMs       uint64          // 0 = open-ended
    Mode          PlaybackMode    // Clip | Continuous

    PC            *webrtc.PeerConnection
    VideoTrack    *webrtc.TrackLocalStaticRTP
    DataChannel   *webrtc.DataChannel
    Pacer         *Pacer
    SegmentReader SegmentReader

    // Lifecycle
    ctx           context.Context
    cancel        context.CancelFunc
    createdAt     time.Time
    lastActivity  time.Time
}

7.2 SessionManager

type SessionManager struct {
    sessions     sync.Map              // session_id → *Session
    maxSessions  int                   // default: 50
    idleTimeout  time.Duration         // default: 5 minutes
    iceConfig    ICEConfig
}

func (sm *SessionManager) Create(req CreateSessionRequest) (*Session, error)
func (sm *SessionManager) Get(id string) (*Session, error)
func (sm *SessionManager) AddICECandidate(id string, candidate webrtc.ICECandidateInit) error
func (sm *SessionManager) Delete(id string) error

7.3 Cleanup

Sessions are cleaned up when:

  1. Browser calls DELETE /webrtc/session/{id}
  2. Browser sends stop on data channel
  3. ICE connection state goes to disconnected or failed — detected via PeerConnection.OnConnectionStateChange callback for immediate teardown (don't rely solely on the reaper)
  4. No data channel activity for idleTimeout (default 5 min)
  5. Server shutdown (graceful drain: close all PeerConnections)

A reaper goroutine runs every 30 seconds, scanning for idle sessions.

Memory budget: With maxSessions=50 and 2 blobs per session (current + prefetched, ~10MB each), worst case is ~1GB. The SessionManager should track total buffered bytes and reject new sessions if a global memory limit (env: WEBRTC_MAX_BUFFER_MB, default 512) would be exceeded.


8. ICE Configuration

type ICEConfig struct {
    STUNServers []string      // env: WEBRTC_STUN_SERVERS (comma-separated)
    TURNServers []TURNServer  // env: WEBRTC_TURN_SERVERS (JSON array)
}

type TURNServer struct {
    URL        string `json:"url"`        // "turn:turn.example.com:3478"
    Username   string `json:"username"`
    Credential string `json:"credential"`
}

Defaults:

  • STUN: stun:stun.l.google.com:19302
  • TURN: None (optional, configure for restrictive NATs)

Future: Embed pion/turn (pure Go) for self-hosted TURN.


9. Playback State Machine

stateDiagram-v2
    [*] --> Created: POST /webrtc/session
    Created --> Connecting: ICE negotiation
    Connecting --> Buffering: ICE connected, fetching first segment
    Buffering --> Playing: Frames ready, pacer started
    Playing --> Paused: pause command
    Paused --> Playing: resume command
    Playing --> Seeking: seek command
    Seeking --> Buffering: Loading segment for new position
    Playing --> Gap: Segment ended, gap detected
    Gap --> Buffering: Next segment available
    Gap --> Waiting: No next segment (continuous mode)
    Waiting --> Buffering: New segment appeared
    Playing --> Ended: end_ts reached or last segment done
    Ended --> Seeking: seek command (restart)
    Ended --> [*]: DELETE or timeout

    Playing --> [*]: disconnect / error / stop
    Paused --> [*]: disconnect / error / stop / timeout
    Waiting --> [*]: disconnect / error / stop / timeout
Loading

10. Error Handling

Scenario Behavior
S3 fetch fails Retry 3x with 1s backoff. If still failing, emit error event + teardown.
Segment has no keyframe before start_ts Fall back to segment start. Emit seeked with actual position.
No segments found for stream_id + time range Return 404 on session creation.
ICE fails to connect within 30s Server tears down session.
Data channel message parse error Log + ignore. Max message size: 4KB. Messages exceeding this are dropped.
Segment has no SPS/PPS Scan first keyframe NALUs, then walk backwards through prior segments. If none found, emit error.
HEVC stream requested Return 422 on session creation.
start_ts before first segment Snap to first segment start. Emit seeked with actual position.
start_ts after last segment Return 404 (clip mode) or snap to last segment and emit waiting (continuous mode).
Segment written during read Safe — S3 objects are immutable once written. In-progress chunks are not yet in S3.
Rapid seek commands Debounced: 100ms window, only last seek executes. In-flight S3 fetches cancelled via context.
Browser disappears (no ICE keepalive) ICE state → disconnected. Cleanup after 10s grace period.
Max sessions reached Return 429 on session creation.
Playback speed out of range Clamp to [0.25, 16.0], emit speed event with actual rate.

11. Configuration

Environment Variable Default Description
WEBRTC_ENABLED false Enable WebRTC endpoints
WEBRTC_HTTP_PORT 8083 HTTP port for signaling
WEBRTC_MAX_SESSIONS 50 Max concurrent sessions
WEBRTC_IDLE_TIMEOUT 5m Session idle timeout
WEBRTC_STUN_SERVERS stun:stun.l.google.com:19302 Comma-separated STUN URIs
WEBRTC_TURN_SERVERS "" JSON array of TURN configs
WEBRTC_PREFETCH_THRESHOLD 0.8 Prefetch next segment at this %
WEBRTC_GAP_HOLD_MS 0 Pause duration at gaps (continuous mode)
WEBRTC_POLL_INTERVAL 2s Poll interval for new segments (continuous mode)

12. Dependencies

Package Version Purpose CGO
github.com/pion/webrtc/v4 latest WebRTC peer connections, tracks, data channels No
github.com/pion/rtp v1.8.x Already in go.mod No

pion/webrtc transitively pulls in pion/ice, pion/dtls, pion/srtp, pion/interceptor, pion/sdp — all pure Go.

No new CGO dependencies. No external processes.


13. File Layout

cloud/video-service/
├── internal/
│   ├── webrtc/
│   │   ├── session.go            # Session struct, PeerConnection lifecycle
│   │   ├── session_manager.go    # Create/get/delete, enforce limits, reaper
│   │   ├── signaling.go          # HTTP handlers (POST/PATCH/DELETE)
│   │   ├── pacer.go              # Frame timing, speed control, RTP writes
│   │   ├── segment_reader.go     # SVS1 → NALU iterator (wraps existing S3/parse)
│   │   ├── nalu.go               # Annex-B start code stripping
│   │   ├── data_channel.go       # Control protocol encode/decode/dispatch
│   │   ├── playback.go           # Playback loop: segment iteration, gap handling
│   │   ├── config.go             # ICE + session config from env vars
│   │   └── metrics.go            # Prometheus: active sessions, bytes sent, etc.
│   ├── handler/
│   │   └── query_handler.go      # Existing (unchanged)
│   ├── server/
│   │   └── ingestion.go          # Existing (unchanged)
│   └── ...
├── cmd/
│   └── server/
│       └── main.go               # Add HTTP listener + webrtc.RegisterHandlers()
└── ...

14. RLS Integration

The signaling handler extracts x-sky-principal from HTTP headers (same as gRPC metadata). On session creation:

  1. Parse principal from x-sky-principal header using the existing security package. The header may be JSON or Base64-encoded JSON — reuse the auto-detection logic from security.UnaryServerInterceptor.
  2. Attach to context via security.WithPrincipal(ctx, &principal) so that all Ent queries pass through the existing RLSInterceptor.
  3. Query VideoSegment with RLS interceptor to verify the principal can access the requested stream_id + time range.
  4. If no accessible segments, return 403.
  5. Store principal in session for subsequent segment fetches (seek).

All segment queries during playback (seek, prefetch, continuous-mode polling) go through the same RLS-filtered Ent queries. The session holds a reference to the *ent.Client from cmd/server/main.go and the existing *minio.Client and s3Bucket for S3 access — no new connections are created.


15. Future Extensions

  • Live streaming: Same WebRTC track, fed by NATS subscriber instead of S3. Session creation with mode: "live" subscribes to the camera's NATS topic.
  • Embedded TURN: pion/turn server in the same binary.
  • Adaptive bitrate: If multiple quality levels are stored, switch tracks based on RTCP receiver reports.
  • Audio: Add a second track for audio when audio streams are stored.
  • Recording bookmarks: Browser sends timestamp markers via data channel for annotation.

16. Resolved Design Decisions

Question Decision Rationale
HTTP port Separate port (8083) Avoids cmux complexity. gRPC stays on 8082, HTTP signaling on 8083. Both share the same Ent/S3 clients.
Segment cache Yes, per-session LRU Current segment blob is held in memory for seeks within the same segment. Evicted on segment transition. Max 2 blobs per session (current + prefetched).
CORS Configurable via WEBRTC_CORS_ORIGINS env var Defaults to * in dev, must be set explicitly in production. Applied via standard Go CORS middleware on the HTTP mux.
Goroutine lifecycle Session-scoped context All goroutines (pacer, prefetcher, data channel reader, reaper) derive from session.ctx. Calling session.cancel() tears down everything.

17. Gemini Review Summary

Round 1 (2026-03-09, design review)

Finding Severity Resolution
HEVC streams unsupported by browsers CRITICAL Added codec detection + 422 rejection (Section 1)
Missing SPS/PPS in segments CRITICAL Added fallback: scan keyframes, then walk prior segments (Section 5.3)
~60s latency for near-live tailing IMPORTANT Documented as inherent limitation, deferred to mode: "live" (Section 3.2)
RTP timestamp overflow at 13h IMPORTANT Addressed: uint32 wraps naturally, pion handles it (Section 5.3)
SDP profile-level-id mismatch IMPORTANT Added: extract from SPS, default to Constrained Baseline (Section 5.3)
Rapid seek interleaving IMPORTANT Added: 100ms debounce + context cancellation (Section 6.1, 10)
RLS context propagation IMPORTANT Explicit security.WithContext call documented (Section 14)
S3 client reuse MINOR Explicit: reuse existing minio.Client + s3Bucket (Section 14)
Data channel max message size MINOR Added: 4KB limit (Section 10)

Round 2 (2026-03-09, TDD plan + codebase cross-check)

Finding Severity Resolution
SVS1 frames are Protobuf-wrapped, not raw H.264 CRITICAL Updated Section 5.1 + 5.2: SegmentReader must unmarshal pb.VideoFrameSpec
initialization_segment not an Ent field CRITICAL Updated Section 5.3: read from pb.SegmentIndex in S3 blob header
SPS/PPS fallback can trigger many S3 calls IMPORTANT Added max 10 segment walk-back cap (Section 5.3)
Memory pressure at 50 concurrent sessions IMPORTANT Added WEBRTC_MAX_BUFFER_MB global memory limit (Section 7.3)
ICE disconnect needs immediate cleanup IMPORTANT Added OnConnectionStateChange callback (Section 7.3)
security.WithContext should be WithPrincipal MINOR Fixed in Section 14
x-sky-principal may be Base64-encoded MINOR Noted in Section 14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment