Version: 1.0 Draft Date: 2026-03-25 Author: Felix Sun Status: Draft
Knowledge workers spend 15-25 hours per week in meetings, presentations, and video calls, yet retain only a fraction of what was discussed. Existing solutions (Otter.ai, Granola, Limitless) require sending audio to the cloud, raising privacy concerns for organizations handling sensitive data. There is no native macOS app that records screen + audio, transcribes in real-time, and generates AI summaries — all locally on the user's machine.
Who experiences this: Engineers, PMs, designers, executives — anyone who attends meetings, watches technical presentations, or reviews recorded content and needs to recall specific details later.
Cost of not solving: Lost context, duplicated discussions, missed action items, and the growing unease of sending private meeting audio to third-party cloud services.
- Privacy-first capture: All transcription and AI processing runs locally on Apple Silicon — no audio or video ever leaves the machine unless the user explicitly exports
- Zero-friction recording: Start capturing a meeting or video in under 2 seconds from the menu bar — no app switching, no configuration
- Real-time comprehension: Live subtitles with <2s latency so users can follow along with foreign-language content or in noisy environments
- Total recall: Full-text search across all recordings so users can find "that thing someone said about the API redesign" in seconds
- Actionable output: AI-generated summaries with key moments, decisions, and action items — not just a wall of transcript text
- Cloud transcription service — We are not building a cloud API or SaaS. All processing is local. Cloud sync (e.g., iCloud) may come later but is not v1.
- Real-time collaboration — No shared transcripts, live co-editing, or team features in v1. This is a single-user tool first.
- Mobile app (iOS) — macOS only. iPhone/iPad lacks the GPU horsepower for local whisper inference at acceptable latency.
- Speaker diarization — Identifying who said what is desirable but technically hard with local-only processing. Deferred to v2.
- Video editing — We capture video for playback context, not for editing. No trimming, filters, or effects. (The Editor tab composes clips from segments but is not a full video editor.)
- As a knowledge worker, I want to start recording my screen and audio with one click from the menu bar so that I can capture meetings without disrupting my workflow
- As a knowledge worker, I want to see real-time subtitles inside the app window so that I can follow conversations in noisy environments or in languages I'm less fluent in
- As a knowledge worker, I want to browse my past recordings with synced video and transcript so that I can review what was discussed in yesterday's standup
- As a knowledge worker, I want AI-generated summaries of my recordings so that I can get the key points without rewatching an entire hour-long meeting
- As a knowledge worker, I want to search across all my transcripts by keyword so that I can find the exact moment someone discussed a specific topic
- As a knowledge worker, I want to export a transcript as SRT/TXT/Markdown so that I can share meeting notes with my team
- As a knowledge worker, I want to select which window to record so that I only capture the Zoom call, not my entire desktop
- As a privacy-conscious user, I want all transcription to happen locally on my Mac so that my meeting audio never leaves my machine
- As a privacy-conscious user, I want a clear indicator when recording is active so that I always know when Percev is capturing
- As a privacy-conscious user, I want to delete recordings permanently so that sensitive content can be fully removed
- As a power user, I want to use Claude Code in the embedded terminal to ask questions, summarize, or run custom workflows on my transcripts — without leaving the app
- As a power user, I want to select multiple segments across recordings, compose them in the Editor with insights cards, and export a self-contained video clip to share with colleagues
- As a power user, I want to choose between whisper model sizes (small/medium/large) so that I can trade off between speed and accuracy based on my hardware
P0-0: First Launch Setup
- On first launch, a setup wizard guides the user through:
- Recording consent disclaimer (one-time, see Q5)
- Whisper model download — default: Small (~466MB). User can choose a different size. Download runs with a progress bar showing percentage, speed, and estimated time remaining
- Screen Recording permission — prompt to grant macOS Screen Recording access
- Microphone permission — optional, prompt with explanation
- If user skips or cancels download, the app reminds them from the menu bar with a badge
- Acceptance criteria:
- Setup wizard appears on first launch only
- Whisper model download shows progress bar with %, speed (MB/s), and ETA
- User can change whisper model size during setup
- Downloads are resumable if interrupted (don't restart from zero)
- App transitions to menu bar after setup completes
- If downloads are skipped, menu bar shows a badge prompting the user to complete setup
P0-1: Menu Bar Presence
- Percev lives in the macOS menu bar as a lightweight status icon and also appears in the Dock
- Click menu bar icon to reveal dropdown: Start/Stop recording (with ⌘⇧R shortcut indicator), Open Percev, Join Discord, Quit
- Click Dock icon to bring the main app window to front
- Recording indicator: menu bar icon changes to a red dot with elapsed recording time (e.g.,
● 12:34) when actively recording — rendered as a non-template NSImage so the red color is visible - Acceptance criteria:
- App launches with both a menu bar icon and a Dock icon
- Menu dropdown shows Start/Stop recording with keyboard shortcut, Open Percev, Join Discord, and Quit
- Global keyboard shortcut to start/stop recording (default: ⌘⇧R, configurable)
- Menu bar icon shows red dot + elapsed time during active recording
P0-2: Window Picker
- Window picker is only shown when screen recording is enabled; if the user starts a recording with video off, no picker is shown and recording begins immediately
- The picker can also be triggered mid-session from the menu bar dropdown or the main window by enabling video on an active recording
- Smart detection: auto-suggests meeting apps (Zoom, Teams, Lark, Google Meet, Slack Huddle) if they're running
- Shows window thumbnails with app name and title for easy identification
- Async thumbnail loading: window list appears immediately with placeholder icons; thumbnails load in the background and replace placeholders as they become available — picker is never blocked waiting for thumbnails
- Acceptance criteria:
- When video is off, recording starts immediately with no picker shown
- Enabling video from menu bar or main window while idle triggers the window picker before starting
- Enabling video during an active recording triggers the picker to select a window and begins screen capture without interrupting audio/transcript
- Picker opens instantly — window list (app name + title) is shown before any thumbnails are fetched
- Each window card shows a grey placeholder graphic until its thumbnail is ready
- Thumbnails load asynchronously and fade in as they resolve, without shifting layout
- Picker shows all visible windows with thumbnails
- Meeting app windows are surfaced at the top
- User can switch target window during an active recording
- "Full screen" option captures the entire display
P0-3: Screen + Audio Recording
- Captures the targeted window as video (H.265/HEVC, hardware-encoded via VideoToolbox) using ScreenCaptureKit
- Video is optional (default: off) — toolbar has a video on/off toggle button. Audio + transcript are always captured. When video is off, NO
video.mp4is created (file existence = hasVideo). - Video toggle + permission flow: When user toggles video ON, check screen recording permission. If not granted, prompt for it via the macOS system dialog. If permission denied, show a message and keep video off.
- Window Picker — shown when video is enabled and recording starts (see P0-2). If video is off, recording starts immediately with no picker.
- Camera recording is a separate opt-in — when enabled, the front camera is recorded to a separate video file (
camera.mp4) in the recording directory, independent of screen recording - Simultaneously captures system audio (what the user hears) via ScreenCaptureKit
- Optionally captures microphone input (user's voice)
- No virtual audio devices needed — ScreenCaptureKit captures system audio natively regardless of output device
- No metadata.json — all recording metadata is derived from the file system:
- Date: parsed from directory name (
yyyy-MM-dd-HHmmss-title) - Duration: read from
audio.wavvia AVAsset - hasVideo:
video.mp4file exists and is >10KB - hasCamera:
camera.mp4file exists and is >10KB - hasMicrophone:
mic.wavfile exists and is >10KB - Window title: stored in
.titletext file (one line), falls back to directory name
- Date: parsed from directory name (
- Acceptance criteria:
- HEVC video recording with VFR (only encode when screen changes) with <5% CPU overhead on M1 or newer
- Toolbar has video on/off toggle button (persisted to settings)
- Toggling video ON checks screen recording permission, prompts if needed
- When video is OFF, NO video.mp4 file is created
- When video is ON, window picker is shown before recording starts
- Camera recording is opt-in and saves to a separate
camera.mp4file in the same recording directory - Camera and screen recording can be enabled/disabled independently
- System audio captured regardless of output device (speakers, headphones, Bluetooth, AirPods)
- Microphone capture is opt-in with clear permission handling
- Recording continues even if the target window is minimized or occluded
- Only ONE ScreenCaptureKit audio stream at a time (enforce single-instance)
- Clicking "Start Recording" auto-selects the active recording in the sidebar
- Recording directory appears in library immediately (no metadata.json dependency)
- All metadata derived from file system: date from dir name, duration from audio, hasVideo from file existence
- If app is force-quit during recording, the recording appears in library on next launch
P0-4: Key Frame Extraction
- During or after recording, automatically detect and save key frames when significant visual changes occur (slide transitions, screen switches, new content)
- Detection: compare consecutive frame histograms — large pixel difference = new content. No AI needed.
- Key frames saved as JPEGs in a
keyframes/subdirectory with timestamp filenames - These frames serve two purposes:
- Visual timeline in the playback view (scrub through key moments)
- Claude Code can read them (multimodal) — enabling "Identify Key Moments" quick-action button
- Acceptance criteria:
- Key frames extracted automatically on recording completion (or during recording)
- Frame detection uses histogram comparison — lightweight, no ML required
- Key frames saved as
keyframes/HH-MM-SS.jpgin the recording directory - Duplicate/near-identical frames are deduplicated (threshold configurable)
- Key frames appear as a visual timeline strip in the playback view
- Clicking a key frame thumbnail seeks the video to that timestamp
P0-5: Real-Time Transcription
- Two-tier architecture using whisper.cpp C API with Metal GPU acceleration:
- Tier 1 (Partial): Every 2 seconds, transcribe the latest 2s audio chunk. Each chunk becomes its own append-only partial segment — previously rendered partial text is never modified, only new chunks are appended. Displayed as a single flowing block (consecutive partial chunks joined with spaces, single timestamp above the first partial).
- Tier 2 (Final): Every 20 seconds, re-transcribe ALL accumulated audio (up to 30s) for complete, punctuated sentences. Cut at sentence boundaries, carry remainder to next pass. All partial segments are removed and replaced by the final.
- Language: always auto-detected from audio — no settings, no locale guessing. Whisper auto-detects via
language = nil(do NOT setdetect_language = true— it produces 0 segments). Noinitial_prompt— it causes hallucination/repetition. - Chinese text normalization — whisper outputs mixed Simplified/Traditional Chinese. Default: convert to Simplified via macOS
CFStringTransform("Hans-Hant")for full Unicode coverage. User can change in toolbar menu (Off / Simplified / Traditional). Normalization is applied at display time (in TranscriptTextView), so both live and playback transcripts are normalized regardless of how they were stored.
- Acceptance criteria:
- Partial text appears within 2 seconds of speech
- Each 2s partial chunk is a separate append-only segment — previously rendered text is never modified
- Consecutive partial chunks display as one flowing block with spaces between chunks
- Final sentences appear within 20-25 seconds with proper punctuation
- Final segments are 20s+ of audio, cut at sentence boundaries, remainder carried over
- Incomplete sentences carry over to the next transcription window (no cut-off mid-sentence)
- Silence is handled gracefully (no phantom text, no timing drift)
- No
initial_prompton any tier (prevents hallucination and repetition) - Total audio context never exceeds 30s (whisper hard limit)
- Language auto-detected from audio — Chinese audio produces Chinese text, not English
- Chinese text normalization defaults to Simplified, uses CFStringTransform for full coverage
- All transcript text is selectable and copyable — user selections on finalized text are preserved during live partial updates
P0-6: Real-Time Subtitle Panel
- Subtitle panel embedded inside the app window (not a floating overlay), rendered using STTextView (TextKit 2 editor component) in read-only mode for true cross-segment text selection
- All segments rendered as a single
NSAttributedStringdocument — users can drag-select across any combination of finalized and partial text - Selection preservation: Uses NSTextStorage common-prefix diff for incremental updates — only appends/replaces changed characters at the end. User selections on earlier text are never disturbed during live updates.
- Shows all finalized sentences for the current recording session + current partial text
- New sentences appear at the bottom; panel auto-scrolls to follow the latest text
- Layout: During active recording, header shows compact single line (red dot + title). Transcript panel fills the entire detail area. Editorial typography: timestamps above text blocks (70% font size, monospaced digits), generous proportional spacing (40% line height, 80% block spacing). Text soft-wraps with no horizontal scrollbar.
- Font customization: Toolbar "Aa" menu with "Choose Font..." opens macOS native NSFontPanel. Custom font name and point size are persisted to settings and applied to ALL text (timestamps, finals, partials). "Reset to System Font" reverts to default. Font changes trigger full attribute refresh across the entire document.
- Chinese normalization applied at display time in TranscriptTextView, configurable in toolbar menu (Off / Simplified / Traditional).
- Acceptance criteria:
- All sentences for the session are retained and scrollable
- Panel auto-scrolls to the latest sentence as new text arrives
- Scrolling back pauses auto-scroll; reaching the bottom resumes it
- Text is readable against both light and dark backgrounds
- Partial text shows as flowing block (consecutive chunks joined, single timestamp)
- User can toggle panel visibility with a keyboard shortcut
- User can choose any system font and size via macOS font panel
- Font changes apply to ALL text — timestamps, finals, and partials
- All transcript text is selectable and copyable across all segments
- User text selection is preserved during live 2-second partial updates
- During active recording, header is compact (red dot + title, one line)
- During active recording, transcript panel fills available space (no video area)
- Chinese normalization (Off/Simplified/Traditional) configurable from toolbar menu
P0-7: Recording Library
- All data stored as plain files in the user-configurable home directory (default:
~/Percev/) - No database — the file system is the source of truth. Library view scans the root directory and reads
metadata.jsonfrom each recording subdirectory - Home directory structure:
~/Percev/ # user-configurable in settings ├── CLAUDE.md # auto-generated, explains data format for Claude Code ├── 2026-03-25-143025-standup/ # {date}-{HHmmss}-{title} for uniqueness │ ├── transcript.jsonl # timestamped transcript lines │ ├── audio.wav # system audio (16kHz mono WAV) │ ├── mic.wav # microphone audio (16kHz mono WAV, optional) │ ├── video.mp4 # screen recording (optional) │ ├── camera.mp4 # camera recording (optional) │ ├── metadata.json # duration, window title, date, thumbnail path, etc. │ ├── thumbnail.jpg # first non-blank frame │ └── keyframes/ # auto-extracted key frames │ ├── 00-02-15.jpg │ ├── 00-08-42.jpg │ └── ... ├── 2026-03-25-160530-design-review/ │ ├── ... └── .percev/ # hidden folder for app internals ├── settings.json # app preferences └── models/ ├── whisper-small.bin # whisper model └── ... <!-- Updated: 2026-03-30 — Directory format includes HHmmss for uniqueness. Mic audio is separate file. Removed embedding model and search index from .percev/ (deferred to P2). --> - Library view shows recordings sorted by date, with title, duration, and thumbnail
- Acceptance criteria:
- Active recording appears in library immediately when recording starts (with "Recording in Progress" indicator)
- Recordings appear in library immediately after stopping
- Thumbnail generated from first non-blank frame
- Auto-title from first meaningful transcript text or window title
- Delete recording permanently removes the entire recording directory
- Storage usage displayed in settings
- Home directory is configurable in settings (moving existing data is handled automatically)
- Library rebuilds correctly from file system (no hidden database state)
P0-8: Playback with Synced Transcript
- Video in separate floating window — managed by
VideoWindowManagersingleton. Video player is NOT embedded in the main pane; it opens as a floating NSWindow beside the main window. This gives the transcript maximum vertical space. - VideoWindowManager behavior:
show(): reuses a single persistent NSWindow, swaps the AVPlayer when switching recordings. Window never closes/reopens on recording switch — just content swap.hide(): closes the window when switching to an audio-only recording. Does NOT changeisEnabledpreference.toggle(): user action — saves frame + togglesisEnabledpreference.- X button: saves frame + sets
isEnabled = false. isEnabledis a global preference that persists across recording switches. If user closes the video window, it stays closed for subsequent recordings until they reopen it.- Window frame (position + size) saved via
NSWindow.saveFrame(usingName:)on user actions only. Restored on next open. Smart positioning beside main window only on first-ever open. contentAspectRatiolocked to video's natural ratio (async-loaded from track metadata).- Video window shows inline AVPlayer controls. Bidirectional sync with transcript controls via KVO on
player.timeControlStatus.
- Audio-only recordings: no video window, no video toggle button.
hasVideodetermined frommetadata.json(not file existence — video.mp4 is always created). - Clicking a transcript timestamp jumps playback to that position (custom
seekTimeKeyattribute + NSClickGestureRecognizer, not.linkto avoid blue styling). - Current spoken line is highlighted during playback (accent background color).
- Playback controls: compact bar with skip ±10s, play/pause, speed menu (pill-shaped
1xbutton), video toggle icon. - Delete confirmation: trash button shows alert dialog before deleting.
- Acceptance criteria:
- Transcript scrolls automatically to follow playback position
- Click any timestamp to seek playback to that moment
- Playback speed: 0.5x, 1x, 1.5x, 2x (compact menu button)
- Keyboard shortcuts for play/pause (Space), skip ±10s (←/→)
- Video opens in a separate floating window, positioned beside main window
- Video window persists across recording switches (global preference)
- Video window hidden for audio-only recordings, reopens for next video recording
- Video window frame (size + position) remembered across sessions
- Video window aspect ratio locked to prevent black bars
- Video and transcript controls bidirectionally synced
- Delete recording requires confirmation dialog
P0-9: Embedded Terminal with Claude Code
- Percev does NOT include built-in AI features (no summaries, no chat, no LLM)
- Instead, an embedded terminal panel lives side-by-side with the transcript and video player
- Users bring their own Claude subscription and run Claude Code directly inside Percev
- Each recording is stored as a well-structured directory directly under the Percev home directory:
~/Percev/2026-03-25-143025-standup/ ├── transcript.jsonl # timestamped transcript lines ├── audio.wav # system audio (16kHz mono WAV) ├── mic.wav # microphone audio (optional, 16kHz mono WAV) ├── video.mp4 # screen recording (optional) ├── camera.mp4 # camera recording (optional) └── metadata.json # duration, window title, date, file paths, etc. - Auto-launch Claude Code on recording selection — when a recording is selected, Percev auto-starts a new terminal session in the recording's directory and launches
claudeautomatically. If the user switches to a different recording, the existing Claude session is killed and a new one starts in the new directory. Users type directly in the SwiftTerm terminal (no separate input field). - Implementation: SwiftTerm (open-source terminal emulator) with
LocalProcessTerminalView - Timestamp linking: parse Claude Code output for timestamps (e.g.,
[00:12:34]) and make them clickable to jump the video player to that moment - Layout: vertical split on left, terminal on right — the left side shows video player on top and synced transcript on bottom (VSplitView). The right side is the embedded Claude Code terminal (resizable HSplitView). This layout gives equal prominence to content review (left) and AI interaction (right).
- Quick-Action Buttons — toolbar buttons above the terminal that send pre-built prompts to Claude Code with one click:
- 📝 Summarize — generate a structured summary (key topics, decisions, action items)
- ✅ Action Items — extract action items with assignees and deadlines
- ❓ Ask — opens a text input for a custom question, sends it to Claude Code with the transcript as context
- 📧 Follow-up Email — draft a follow-up email from the meeting
- 🌐 Translate — translate the transcript to a selected language
- 🖼️ Key Moments — send key frame images + transcript to Claude Code to identify and describe the most important visual moments (slide content, diagrams, code shown on screen)
- 🔄 Compare — select another recording and compare what changed between meetings
- Users can also type freely in the terminal for any custom workflow
- Button implementation: each button constructs a
claudeCLI command with the appropriate prompt and pipes it to the embedded terminal (e.g.,claude "Summarize the meeting transcript in transcript.jsonl. Include key topics, decisions, and action items.") - Users can customize or add their own quick-action buttons in settings (custom prompt templates)
- Acceptance criteria:
- Quick-action buttons are visible in a toolbar above the embedded terminal
- Each button sends a pre-built prompt to Claude Code — no typing required
- "Ask" button opens a text input field for custom questions
- Users can add/edit/reorder custom quick-action buttons in settings
- Buttons are disabled when Claude Code is not installed or no recording is selected
- Selecting a recording auto-starts a terminal session and launches Claude Code in the recording directory
- Switching recordings kills the existing Claude session and starts a new one in the new directory
- Timestamps in terminal output (e.g.,
[00:12:34]) are clickable and seek the video player - Split view is resizable; terminal can be toggled visible/hidden (keyboard shortcut)
- Recording directory is human-readable and Claude Code-friendly
- JSONL transcript format includes timestamps, text, and language per line
- A CLAUDE.md file is auto-generated in the recordings root directory explaining the data format
- Recordings directory path is configurable in settings
P0-10: Whisper Model Selection
- Settings: choose whisper model size
- Small (~466MB, fastest, good for real-time)
- Medium (~1.5GB, balanced, better accuracy)
- Large-v3 (~3.1GB, best accuracy, slower)
- In-app model download with progress
- Acceptance criteria:
- Model download shows progress and estimated time
- User warned if selected model may cause >2s partial latency on their hardware
- Model switch takes effect on next recording (not mid-recording)
See spec-p2.md for deferred features including:
- P2: Editor, Semantic Search, Speaker Diarization, iCloud Sync
No in-app telemetry — consistent with our privacy-first positioning. Metrics are gathered from external signals and community feedback only.
| Metric | Target | Stretch | Measurement |
|---|---|---|---|
| Downloads (first month) | 5,000 | 15,000 | LemonSqueezy analytics |
| Paid conversions (first month) | 250 | 750 | LemonSqueezy sales data |
| Refund rate | <5% | <2% | LemonSqueezy |
| Revenue (first 3 months) | $5,000 | $15,000 | LemonSqueezy |
| Metric | Target | Stretch | Measurement |
|---|---|---|---|
| GitHub stars (if open-source) | 1,000 | 5,000 | GitHub |
| Community members (Discord) | 500 | 2,000 | Discord |
| Bug reports resolved | >80% within 1 week | >90% | GitHub issues |
| Social media mentions | 50/month | 200/month | Manual tracking |
┌──────────────────────────────────────────────────┐
│ Percev.app │
│ │
│ ┌──────────────┐ ┌───────────────────────────┐ │
│ │ Menu Bar UI │ │ Recording Engine │ │
│ │ (SwiftUI) │ │ │ │
│ │ │ │ ScreenCaptureKit │ │
│ │ • Start/Stop │ │ ├─ Video (H.264) │ │
│ │ • Window Pick│ │ ├─ System Audio (PCM) │ │
│ │ • Status │ │ └─ Mic Audio (PCM) │ │
│ └──────────────┘ └───────────┬───────────────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────▼────────────────┐ │
│ │ Subtitle │ │ Transcription Engine │ │
│ │ Panel │ │ │ │
│ │ (SwiftUI, │ │ whisper.cpp C API │ │
│ │ in-window) │◄─│ ├─ 2s partials (live) │ │
│ │ │ │ └─ 20s finals (sentences)│ │
│ └──────────────┘ └───────────┬───────────────┘ │
│ │ │
│ ┌──────────────┐ ┌──────────▼────────────────┐ │
│ │ Library & │ │ Storage (Plain Files) │ │
│ │ Playback │ │ ~/Percev/ (configurable) │ │
│ │ (SwiftUI) │◄─│ │ │
│ │ │ │ <recording-name>/ │ │
│ │ • Search │ │ ├─ transcript.jsonl │ │
│ │ (USearch + │ │ ├─ audio.wav │ │
│ │ MiniLM) │ │ ├─ video.mp4 │ │
│ │ │ │ └─ metadata.json │ │
│ │ │ │ │ │
│ │ │ │ .percev/search.usearch │ │
│ └──────────────┘ └───────────┬───────────────┘ │
│ │ │
│ ┌──────────▼────────────────┐ │
│ │ Embedded Terminal (P1) │ │
│ │ (SwiftTerm / PTY) │ │
│ │ │ │
│ │ Claude Code runs in-app │ │
│ │ ├─ Auto-cd to recording │ │
│ │ ├─ Reads JSONL transcripts │ │
│ │ ├─ Clickable timestamps │ │
│ │ └─ Any custom AI workflow │ │
│ └───────────────────────────┘ │
└──────────────────────────────────────────────────┘
- Single ScreenCaptureKit stream — macOS allows only one audio capture stream at a time. Multiple Percev instances or competing apps will conflict.
- Whisper 30s context limit — The transcription engine must ensure final interval (20s) + sentence carryover (up to 10s) never exceeds 30s.
- Metal GPU required — whisper.cpp inference runs on Metal. Intel Macs are not supported.
- Memory pressure — whisper medium model uses ~2.6GB VRAM. Combined with video encoding, total memory overhead should be profiled across M1 (8GB) through M4 (up to 128GB) configurations.
- Screen Recording permission — macOS requires explicit user consent. App must handle the permission flow gracefully with clear instructions.
| Content | Size per hour | Notes |
|---|---|---|
| Video (H.265/HEVC, VFR, 1080p) | ~100MB | VFR + constrained VBR, 200-300 Kbps avg |
| Audio (WAV, 16kHz mono) | ~115MB | Or ~15MB as AAC |
| Transcript (JSONL) | ~200KB | Text is tiny |
| Metadata (JSON) | ~1KB | Tiny |
| Total per hour (video on) | ~115MB | ~115MB with HEVC VFR, ~15MB video-off |
At 2 hours of recording per day: ~7GB/month (video on) or ~1GB/month (video off).
| Property | Value | Notes |
|---|---|---|
| Codec | H.265/HEVC via VideoToolbox | Hardware-accelerated on all Apple Silicon |
| Profile | Main, Auto Level | Sufficient for 8-bit screen content |
| Average bitrate | 200–300 Kbps | Constrained VBR, sufficient for mostly-static screens |
| Peak bitrate | 1.0–1.5 Mbps | Burst for transitions/scrolling |
| Keyframe interval | 5 seconds | Time-based, not frame-count (since VFR) |
| B-frames | Disabled | Reduces latency, negligible benefit for screen content |
| Frame rate | VFR (variable) | Only encode when screen content changes; ScreenCaptureKit delivers frames on-change natively. Typical avg ~2-5 fps for meetings, up to 15fps during scrolling/video |
| Max frame rate | 15 fps cap | Prevents excessive encoding during fast motion |
| Real-time mode | On | Prioritizes low latency over compression |
| Pixel format | Full-range YUV (NV12) | Better text fidelity than video-range |
| Container | .mov (QuickTime) | Best macOS-native support for HEVC |
| # | Question | Owner | Blocking? |
|---|---|---|---|
| 1 | Engineering | No | |
| 2 | Product | No | |
| 3 | Product | No | |
| 4 | Engineering | No | |
| 5 | Legal | No | |
| 6 | Product | No | |
| 7 | Engineering | No |
| Product | Model | Strengths | Weaknesses vs. Percev |
|---|---|---|---|
| Otter.ai | Cloud SaaS, $17/mo | Polished UX, speaker ID, integrations | All audio sent to cloud, subscription cost, meetings-only |
| Granola | Cloud, $10/mo | Clean meeting notes, AI summaries | Meetings-only, cloud-dependent, no video |
| Limitless (ex-Rewind) | Local + Cloud | Full screen recording, search | Requires pendant ($99) for best experience, cloud AI |
| Loom | Cloud, freemium | Async video sharing, screen recording | No transcription focus, cloud storage, no real-time subtitles |
| macOS Dictation | Local | Built-in, free | No recording, no search, no summaries, single-language |
Percev's differentiation:
- Fully local — no cloud, no subscription for core features, no privacy concerns
- Any content — not just meetings: YouTube, lectures, podcasts, presentations
- Real-time subtitles — 2s latency live transcription overlay
- Native macOS — SwiftUI, Metal GPU, ScreenCaptureKit — feels like an Apple app
- Open AI layer — no built-in AI lock-in. Recordings are plain files (JSONL + WAV + MP4) that Claude Code or any tool can read. Users bring their own AI and can build any workflow on top of their data