Serve historical video from video-service to browsers via WebRTC. Pure Go implementation using pion/webrtc. No CGO, no external processes.
Date: 2026-03-09 Status: Draft — reviewed by Gemini, updated with findings
The video-service stores historical H.264 video as SVS1 segments in S3. Today, retrieving video requires either:
- gRPC
GetSegmentFrames+ client-side reconstruction (no browser support) GetImagefor single JPEG frames (spawns FFmpeg per request, doesn't scale)- A download CLI that pipes to FFmpeg to produce MP4 files (offline only)
Browsers cannot consume any of these. The UI expects an HTTP video endpoint that doesn't exist. There is no low-latency seeking, no live-to-historical unification path, and every frame extraction spawns an FFmpeg process.
Goal: Stream stored H.264 NALUs directly to browsers via WebRTC — zero transcoding, sub-second seek, hardware-accelerated decoding on the client.
The ingestion pipeline also supports H.265 (HEVC), but most browsers do not
support HEVC over WebRTC. This design targets H.264 only. On session
creation, the server inspects the first segment's initialization data. If the
stream contains HEVC NALUs (NAL type >= 32), the server returns 422 Unprocessable Entity with {"error": "HEVC streams not supported over WebRTC"}.
HEVC transcoding is a future extension, not part of this design.
sequenceDiagram
participant B as Browser
participant S as video-service
participant S3 as S3 / GCS
B->>S: POST /webrtc/session {stream_id, start_ts, mode, sdp_offer}
S->>S: Create PeerConnection + H.264 track + data channel
S-->>B: 201 {session_id, sdp_answer}
B->>S: PATCH /webrtc/session/{id}/ice {candidates}
S-->>B: 204
Note over B,S: ICE connectivity established
S->>S3: Fetch segment containing start_ts
S3-->>S: SVS1 blob
S->>S: Parse index, extract NALUs from keyframe <= start_ts
S->>B: RTP packets (H.264 NALUs via WebRTC track)
S->>B: Data channel: {"evt":"playing","timestamp_ms":...}
B->>S: Data channel: {"cmd":"seek","timestamp_ms":...}
S->>S3: Fetch segment for new timestamp
S->>B: RTP packets from new position
S->>B: Data channel: {"evt":"playing","timestamp_ms":...}
B->>S: Data channel: {"cmd":"pause"}
S->>B: Data channel: {"evt":"paused"}
Note over B,S: On disconnect or DELETE
B->>S: DELETE /webrtc/session/{id}
S->>S: Cleanup PeerConnection, goroutines
Two modes govern how segments are played and how time gaps are handled.
Play a bounded time window. The server streams frames from start_ts_ms to
end_ts_ms (or end of the containing segment if end_ts_ms is omitted).
Behavior:
- Playback begins at the keyframe <=
start_ts_mswithin the first matching segment. - Frames are sent in order until
end_ts_msis reached or the segment ends. - If multiple segments fall within the time range, they are played sequentially.
- Gaps between segments are skipped silently — the next segment starts
immediately after the previous one ends. A
gapevent is emitted so the browser can show elapsed real time, but playback does not pause. - When the last frame within the range is sent, the server emits
ended.
Use case: Reviewing a specific motion event or time window.
Play forward from start_ts_ms through all available segments indefinitely
(or until end_ts_ms if provided).
Behavior:
- Playback begins at the keyframe <=
start_ts_ms. - When a segment ends, the server queries the DB for the next segment
(
ts_start_ms > current_segment.ts_end_ms, ordered ascending, limit 1). - If there is a gap: The server emits a
gapevent with the gap duration and the next segment's start time, then pauses the RTP clock for a configurable hold period (default: 0 — no hold, jump immediately). After the hold, playback resumes from the next segment. - If there is no next segment: The server emits
waitingand polls for new segments every 2 seconds. When one appears, playback resumes. - Near-live latency caveat: The ingestion pipeline flushes chunks to S3
every ~60 seconds (on keyframe boundaries). This means continuous-mode
tailing has an inherent ~60-second delay behind the live camera feed.
True low-latency live streaming requires a separate
mode: "live"path that subscribes to NATS directly (see Future Extensions). - When
end_ts_msis reached, the server emitsended. - If no
end_ts_msis set, playback continues until the browser disconnects or sendsstop.
Use case: Watching a full recording session, or tailing a live recording.
| Parameter | Required | Default | Description |
|---|---|---|---|
stream_id |
Yes | — | UUID of the video stream |
start_ts_ms |
Yes | — | Playback start (epoch ms) |
end_ts_ms |
No | None | Playback end (epoch ms). Omit for open-ended. |
mode |
No | continuous |
"clip" or "continuous" |
Added to the video-service's HTTP listener alongside any future REST endpoints.
POST /webrtc/session
Content-Type: application/json
{
"stream_id": "uuid-string",
"start_ts_ms": 1709942400000,
"end_ts_ms": 1709942460000, // optional
"mode": "clip", // "clip" | "continuous", default "continuous"
"sdp_offer": "v=0\r\n..."
}
Response (201 Created):
{
"session_id": "uuid-string",
"sdp_answer": "v=0\r\n..."
}Errors:
400— Invalid SDP, missing stream_id, start_ts after end_ts404— stream_id not found or no segments in range422— Stream contains HEVC (unsupported codec for WebRTC)429— Max concurrent sessions reached403— RLS: principal lacks access to this stream
Authentication: Same x-sky-principal header used by existing gRPC endpoints.
The signaling handler extracts the principal and enforces RLS on segment queries.
PATCH /webrtc/session/{session_id}/ice
Content-Type: application/json
{
"candidate": "candidate:842163049 1 udp ...",
"sdp_mid": "0",
"sdp_mline_index": 0
}
Response: 204 No Content
DELETE /webrtc/session/{session_id}
Response: 204 No Content
Idempotent. Also triggered automatically by ICE disconnection or idle timeout.
The stored H.264 NALUs are sent directly to the browser's hardware decoder. No FFmpeg, no re-encoding, no CGO.
S3 blob (SVS1)
→ SegmentReader: parse index, read [uint32 len][VideoFrameSpec protobuf] entries
→ FrameExtractor: unmarshal pb.VideoFrameSpec → raw H.264 frame bytes
→ NALUExtractor: strip Annex-B start codes → bare NALUs
→ pion H264Packetizer: NALUs → RTP packets
→ pion TrackLocalStaticRTP → SRTP → DTLS → ICE → Browser
→ Browser RTCPeerConnection → MediaStream → <video> element
→ Browser hardware H.264 decoder
SVS1 frame format note: SVS1 frames are NOT raw H.264 bytes. Each frame
is a length-prefixed marshaled pb.VideoFrameSpec protobuf message. The
SegmentReader must read the 4-byte uint32 length prefix, read that many
bytes, unmarshal the VideoFrameSpec, then extract the Data field which
contains the actual H.264 Annex-B frame bytes.
Wraps existing S3 fetch + SVS1 parsing logic. Yields frames as an iterator:
type FrameData struct {
TimestampMs uint64
NALUs [][]byte // Bare NALUs (no start codes)
IsKeyFrame bool
}
type SegmentReader interface {
// ReadFrom returns frames from the keyframe <= startTs to endTs.
// If endTs is 0, reads to end of segment.
ReadFrom(ctx context.Context, segmentPath string, startTs, endTs uint64) iter.Seq2[FrameData, error]
}Annex-B stripping: Use h264.AnnexB.Unmarshal() from
github.com/bluenviron/mediacommon/v2/pkg/codecs/h264 (already a transitive
dependency via gortsplib). No hand-rolled byte parsing. The same library
provides IsRandomAccess() for keyframe detection, SPS.Unmarshal() for
profile extraction, and NAL type constants. HEVC detection uses the companion
h265 package.
Controls the rate at which frames are written to the WebRTC track.
type Pacer struct {
track *webrtc.TrackLocalStaticRTP
speed float64 // 1.0 = realtime, 2.0 = 2x, etc.
rtpTS uint32 // Current RTP timestamp (90kHz clock)
ticker *time.Timer
pauseCh chan struct{}
resumeCh chan struct{}
}Timing logic:
- Compute inter-frame delay:
(frame[n+1].TimestampMs - frame[n].TimestampMs) - Adjust for speed:
delay = delay / speed - Sleep for adjusted delay between frames
- RTP timestamp increments:
delta_ms * 90(90kHz clock)
SPS/PPS injection:
- Sent before the first frame of every segment.
- Re-sent on every seek (decoder needs re-initialization).
- Read from the
initialization_segmentfield of thepb.SegmentIndexstored inside the S3 blob header (NOT from an Ent schema field —VideoSegmentdoes not have this field). The server must fetch the blob header (first ~8KB) from S3 to parse theSegmentIndexprotobuf and extractinitialization_segment. - Missing SPS/PPS fallback: If the segment's
SegmentIndexhas noinitialization_segment, the server scans the segment's first keyframe for NAL types 7 (SPS) and 8 (PPS). If still not found, it walks backwards through previous segments in the same stream (max 10 segments to cap S3 calls) until SPS/PPS are located. If none exist anywhere, the session emitserrorand tears down.
SDP codec negotiation:
- The server sets
profile-level-idin the H.264 RTP codec parameters to match the stored stream's actual profile. This is extracted from byte 1-3 of the SPS NAL. If unknown, defaults to Constrained Baseline (42e01f) which all browsers support.
RTP timestamp overflow:
- The 90kHz RTP clock wraps
uint32every ~13.25 hours. The pacer tracks cumulative playback time independently and lets theuint32wrap naturally. pion/webrtc and browser jitter buffers handleuint32wrapping correctly as long as consecutive timestamps are monotonically increasing modulo 2^32.
[Playing segment N] ──── 80% through ────→ [Prefetch segment N+1 from S3]
│
[Segment N ends] ─────────────────────────→ [Start from prefetched buffer]
A background goroutine fetches the next segment when playback reaches 80% of the current segment's duration. The fetched blob is held in memory (segments are ~60s of H.264, typically 2-10 MB). If the prefetch fails, it retries. If the next segment doesn't exist yet (continuous mode), it polls.
A data channel named "control" carries JSON messages for playback control.
{"cmd": "seek", "timestamp_ms": 1709942430000}
{"cmd": "pause"}
{"cmd": "resume"}
{"cmd": "speed", "rate": 2.0}
{"cmd": "stop"}| Command | Behavior |
|---|---|
seek |
Pacer stops. Any in-flight S3 fetch is cancelled via context. Server finds segment + keyframe for timestamp. SPS/PPS re-sent. Pacer resumes from new position. Rapid seeks are coalesced — only the last seek within a 100ms window is executed. |
pause |
Pacer suspends. Track stays open. No RTP sent. |
resume |
Pacer resumes from where it paused. |
speed |
Pacer adjusts inter-frame delay. Accepted range: 0.25 to 16.0. |
stop |
Server emits ended, tears down session. |
{"evt": "playing", "timestamp_ms": 1709942400000}
{"evt": "paused", "timestamp_ms": 1709942415000}
{"evt": "buffering"}
{"evt": "seeking", "timestamp_ms": 1709942430000}
{"evt": "seeked", "timestamp_ms": 1709942428000}
{"evt": "gap", "from_ms": 1709942460000, "to_ms": 1709942520000, "duration_ms": 60000}
{"evt": "waiting"}
{"evt": "ended"}
{"evt": "error", "message": "segment not found"}
{"evt": "position", "timestamp_ms": 1709942430000}| Event | When |
|---|---|
playing |
Playback started or resumed |
paused |
Pacer suspended |
buffering |
Fetching segment from S3, track temporarily idle |
seeking |
Seek command received, looking up segment |
seeked |
Seek complete, includes actual position (keyframe-aligned) |
gap |
Time gap detected between segments. In clip mode: skipped. In continuous mode: skipped after hold period. |
waiting |
Continuous mode: no next segment yet, polling. |
ended |
Reached end_ts, end of last segment (clip mode), or stop command. |
error |
Unrecoverable error, session will be torn down. |
position |
Periodic timestamp update. Sent every 1 second of playback. |
type Session struct {
ID string
StreamID uuid.UUID
StartTsMs uint64
EndTsMs uint64 // 0 = open-ended
Mode PlaybackMode // Clip | Continuous
PC *webrtc.PeerConnection
VideoTrack *webrtc.TrackLocalStaticRTP
DataChannel *webrtc.DataChannel
Pacer *Pacer
SegmentReader SegmentReader
// Lifecycle
ctx context.Context
cancel context.CancelFunc
createdAt time.Time
lastActivity time.Time
}type SessionManager struct {
sessions sync.Map // session_id → *Session
maxSessions int // default: 50
idleTimeout time.Duration // default: 5 minutes
iceConfig ICEConfig
}
func (sm *SessionManager) Create(req CreateSessionRequest) (*Session, error)
func (sm *SessionManager) Get(id string) (*Session, error)
func (sm *SessionManager) AddICECandidate(id string, candidate webrtc.ICECandidateInit) error
func (sm *SessionManager) Delete(id string) errorSessions are cleaned up when:
- Browser calls
DELETE /webrtc/session/{id} - Browser sends
stopon data channel - ICE connection state goes to
disconnectedorfailed— detected viaPeerConnection.OnConnectionStateChangecallback for immediate teardown (don't rely solely on the reaper) - No data channel activity for
idleTimeout(default 5 min) - Server shutdown (graceful drain: close all PeerConnections)
A reaper goroutine runs every 30 seconds, scanning for idle sessions.
Memory budget: With maxSessions=50 and 2 blobs per session (current +
prefetched, ~10MB each), worst case is ~1GB. The SessionManager should track
total buffered bytes and reject new sessions if a global memory limit (env:
WEBRTC_MAX_BUFFER_MB, default 512) would be exceeded.
type ICEConfig struct {
STUNServers []string // env: WEBRTC_STUN_SERVERS (comma-separated)
TURNServers []TURNServer // env: WEBRTC_TURN_SERVERS (JSON array)
}
type TURNServer struct {
URL string `json:"url"` // "turn:turn.example.com:3478"
Username string `json:"username"`
Credential string `json:"credential"`
}Defaults:
- STUN:
stun:stun.l.google.com:19302 - TURN: None (optional, configure for restrictive NATs)
Future: Embed pion/turn (pure Go) for self-hosted TURN.
stateDiagram-v2
[*] --> Created: POST /webrtc/session
Created --> Connecting: ICE negotiation
Connecting --> Buffering: ICE connected, fetching first segment
Buffering --> Playing: Frames ready, pacer started
Playing --> Paused: pause command
Paused --> Playing: resume command
Playing --> Seeking: seek command
Seeking --> Buffering: Loading segment for new position
Playing --> Gap: Segment ended, gap detected
Gap --> Buffering: Next segment available
Gap --> Waiting: No next segment (continuous mode)
Waiting --> Buffering: New segment appeared
Playing --> Ended: end_ts reached or last segment done
Ended --> Seeking: seek command (restart)
Ended --> [*]: DELETE or timeout
Playing --> [*]: disconnect / error / stop
Paused --> [*]: disconnect / error / stop / timeout
Waiting --> [*]: disconnect / error / stop / timeout
| Scenario | Behavior |
|---|---|
| S3 fetch fails | Retry 3x with 1s backoff. If still failing, emit error event + teardown. |
| Segment has no keyframe before start_ts | Fall back to segment start. Emit seeked with actual position. |
| No segments found for stream_id + time range | Return 404 on session creation. |
| ICE fails to connect within 30s | Server tears down session. |
| Data channel message parse error | Log + ignore. Max message size: 4KB. Messages exceeding this are dropped. |
| Segment has no SPS/PPS | Scan first keyframe NALUs, then walk backwards through prior segments. If none found, emit error. |
| HEVC stream requested | Return 422 on session creation. |
| start_ts before first segment | Snap to first segment start. Emit seeked with actual position. |
| start_ts after last segment | Return 404 (clip mode) or snap to last segment and emit waiting (continuous mode). |
| Segment written during read | Safe — S3 objects are immutable once written. In-progress chunks are not yet in S3. |
| Rapid seek commands | Debounced: 100ms window, only last seek executes. In-flight S3 fetches cancelled via context. |
| Browser disappears (no ICE keepalive) | ICE state → disconnected. Cleanup after 10s grace period. |
| Max sessions reached | Return 429 on session creation. |
| Playback speed out of range | Clamp to [0.25, 16.0], emit speed event with actual rate. |
| Environment Variable | Default | Description |
|---|---|---|
WEBRTC_ENABLED |
false |
Enable WebRTC endpoints |
WEBRTC_HTTP_PORT |
8083 |
HTTP port for signaling |
WEBRTC_MAX_SESSIONS |
50 |
Max concurrent sessions |
WEBRTC_IDLE_TIMEOUT |
5m |
Session idle timeout |
WEBRTC_STUN_SERVERS |
stun:stun.l.google.com:19302 |
Comma-separated STUN URIs |
WEBRTC_TURN_SERVERS |
"" |
JSON array of TURN configs |
WEBRTC_PREFETCH_THRESHOLD |
0.8 |
Prefetch next segment at this % |
WEBRTC_GAP_HOLD_MS |
0 |
Pause duration at gaps (continuous mode) |
WEBRTC_POLL_INTERVAL |
2s |
Poll interval for new segments (continuous mode) |
| Package | Version | Purpose | CGO |
|---|---|---|---|
github.com/pion/webrtc/v4 |
latest | WebRTC peer connections, tracks, data channels | No |
github.com/pion/rtp |
v1.8.x | Already in go.mod | No |
pion/webrtc transitively pulls in pion/ice, pion/dtls, pion/srtp,
pion/interceptor, pion/sdp — all pure Go.
No new CGO dependencies. No external processes.
cloud/video-service/
├── internal/
│ ├── webrtc/
│ │ ├── session.go # Session struct, PeerConnection lifecycle
│ │ ├── session_manager.go # Create/get/delete, enforce limits, reaper
│ │ ├── signaling.go # HTTP handlers (POST/PATCH/DELETE)
│ │ ├── pacer.go # Frame timing, speed control, RTP writes
│ │ ├── segment_reader.go # SVS1 → NALU iterator (wraps existing S3/parse)
│ │ ├── nalu.go # Annex-B start code stripping
│ │ ├── data_channel.go # Control protocol encode/decode/dispatch
│ │ ├── playback.go # Playback loop: segment iteration, gap handling
│ │ ├── config.go # ICE + session config from env vars
│ │ └── metrics.go # Prometheus: active sessions, bytes sent, etc.
│ ├── handler/
│ │ └── query_handler.go # Existing (unchanged)
│ ├── server/
│ │ └── ingestion.go # Existing (unchanged)
│ └── ...
├── cmd/
│ └── server/
│ └── main.go # Add HTTP listener + webrtc.RegisterHandlers()
└── ...
The signaling handler extracts x-sky-principal from HTTP headers (same as
gRPC metadata). On session creation:
- Parse principal from
x-sky-principalheader using the existingsecuritypackage. The header may be JSON or Base64-encoded JSON — reuse the auto-detection logic fromsecurity.UnaryServerInterceptor. - Attach to context via
security.WithPrincipal(ctx, &principal)so that all Ent queries pass through the existingRLSInterceptor. - Query
VideoSegmentwith RLS interceptor to verify the principal can access the requestedstream_id+ time range. - If no accessible segments, return
403. - Store principal in session for subsequent segment fetches (seek).
All segment queries during playback (seek, prefetch, continuous-mode polling)
go through the same RLS-filtered Ent queries. The session holds a reference
to the *ent.Client from cmd/server/main.go and the existing *minio.Client
and s3Bucket for S3 access — no new connections are created.
- Live streaming: Same WebRTC track, fed by NATS subscriber instead of S3.
Session creation with
mode: "live"subscribes to the camera's NATS topic. - Embedded TURN:
pion/turnserver in the same binary. - Adaptive bitrate: If multiple quality levels are stored, switch tracks based on RTCP receiver reports.
- Audio: Add a second track for audio when audio streams are stored.
- Recording bookmarks: Browser sends timestamp markers via data channel for annotation.
| Question | Decision | Rationale |
|---|---|---|
| HTTP port | Separate port (8083) | Avoids cmux complexity. gRPC stays on 8082, HTTP signaling on 8083. Both share the same Ent/S3 clients. |
| Segment cache | Yes, per-session LRU | Current segment blob is held in memory for seeks within the same segment. Evicted on segment transition. Max 2 blobs per session (current + prefetched). |
| CORS | Configurable via WEBRTC_CORS_ORIGINS env var |
Defaults to * in dev, must be set explicitly in production. Applied via standard Go CORS middleware on the HTTP mux. |
| Goroutine lifecycle | Session-scoped context | All goroutines (pacer, prefetcher, data channel reader, reaper) derive from session.ctx. Calling session.cancel() tears down everything. |
| Finding | Severity | Resolution |
|---|---|---|
| HEVC streams unsupported by browsers | CRITICAL | Added codec detection + 422 rejection (Section 1) |
| Missing SPS/PPS in segments | CRITICAL | Added fallback: scan keyframes, then walk prior segments (Section 5.3) |
| ~60s latency for near-live tailing | IMPORTANT | Documented as inherent limitation, deferred to mode: "live" (Section 3.2) |
| RTP timestamp overflow at 13h | IMPORTANT | Addressed: uint32 wraps naturally, pion handles it (Section 5.3) |
| SDP profile-level-id mismatch | IMPORTANT | Added: extract from SPS, default to Constrained Baseline (Section 5.3) |
| Rapid seek interleaving | IMPORTANT | Added: 100ms debounce + context cancellation (Section 6.1, 10) |
| RLS context propagation | IMPORTANT | Explicit security.WithContext call documented (Section 14) |
| S3 client reuse | MINOR | Explicit: reuse existing minio.Client + s3Bucket (Section 14) |
| Data channel max message size | MINOR | Added: 4KB limit (Section 10) |
| Finding | Severity | Resolution |
|---|---|---|
| SVS1 frames are Protobuf-wrapped, not raw H.264 | CRITICAL | Updated Section 5.1 + 5.2: SegmentReader must unmarshal pb.VideoFrameSpec |
initialization_segment not an Ent field |
CRITICAL | Updated Section 5.3: read from pb.SegmentIndex in S3 blob header |
| SPS/PPS fallback can trigger many S3 calls | IMPORTANT | Added max 10 segment walk-back cap (Section 5.3) |
| Memory pressure at 50 concurrent sessions | IMPORTANT | Added WEBRTC_MAX_BUFFER_MB global memory limit (Section 7.3) |
| ICE disconnect needs immediate cleanup | IMPORTANT | Added OnConnectionStateChange callback (Section 7.3) |
security.WithContext should be WithPrincipal |
MINOR | Fixed in Section 14 |
| x-sky-principal may be Base64-encoded | MINOR | Noted in Section 14 |