Skip to content

Instantly share code, notes, and snippets.

@alexknowshtml
Last active February 11, 2026 01:50
Show Gist options
  • Select an option

  • Save alexknowshtml/a1f063902287d9d3abe525c50823bb45 to your computer and use it in GitHub Desktop.

Select an option

Save alexknowshtml/a1f063902287d9d3abe525c50823bb45 to your computer and use it in GitHub Desktop.
Provenance Tracker: Mine / Machine / Ours — spec for human+AI code attribution

Byline: Mine / Machine / Ours

Spec — 2026-02-10

Problem

Git blame tells you who last edited a line, but in human+AI collaborative coding, we need finer-grained attribution: what the human wrote, what the agent generated untouched, and what they shaped together.

Name

Byline — every piece of work gets a byline. The byline tells you whose hands shaped it.

Categories

Label Definition
Mine Content the human typed into chat verbatim, or manually edited in their editor between agent turns
Machine Content generated by agent tool calls (Write/Edit) that was never modified before commit
Ours Content that started as agent output but was reshaped by the human, OR went through multiple human↔agent revision cycles

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│  Claude Code     │    │  Byline           │    │  Sidecar Files  │
│  Session         │───▶│  Analyzer         │───▶│  (.byline/)     │
│  Transcripts     │    │  (post-commit)    │    │                 │
│  (.jsonl)        │    │                   │    │                 │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────┐
                       │  git diff    │
                       │  (committed) │
                       └──────────────┘

Decisions Made

  • Granularity: Character/word-level (not line-level). Each line can have mixed provenance — e.g., agent wrote the function but human renamed a parameter.
  • Storage: Sidecar .byline/ directory with JSON files per commit. No git notes.
  • File types: Track everything. Use .byline-ignore for noise (lock files, build output, etc.).
  • Retroactive analysis: Build it. Lower fidelity for historical commits (no snapshot data), but transcript-based Mine/Machine/Ours classification still works.
  • Distribution: Claude Code plugin. Standalone repo at ~/repos/byline/. Install/uninstall via plugin system. Not embedded in any project — works with any Claude Code repo.
  • Repository: https://github.com/alexknowshtml/byline (standalone)

Data Sources (Already Available)

Session Transcripts (~/.claude/projects/{project}/{session-id}.jsonl):

  • Write tool calls: input.content = exact file content written
  • Edit tool calls: input.old_string + input.new_string = before/after
  • Read tool calls: tool_result.content = file state at time of read
  • User messages: message.content[].text = what the human typed
  • Timestamps: ISO 8601 on every entry
  • Sequencing: uuid + parentUuid for ordering

Classification Algorithm

For each file in a commit diff:

Step 1: Find Relevant Sessions

SELECT session_id, file_path, timestamp
FROM session_tool_calls
WHERE file_path = '{changed_file}'
AND timestamp BETWEEN {commit_window_start} AND {commit_window_end}

If multiple sessions touched the same file → flag for merge logic (see Multi-Session Handling).

Step 2: Build File Timeline

Walk the session transcript and extract ordered events for the file:

T1: Write(file, content_v1)         → Agent wrote content_v1
T2: Read(file) → sees content_v1    → No change (still Machine)
T3: Read(file) → sees content_v2    → Human edited between T2-T3 (content_v2 - content_v1 = Mine)
T4: Edit(file, old, new)            → Agent modified (new = Machine, unless old was Mine → Ours)
T5: [commit]                        → Final state

Step 3: Classify Each Segment (Character-Level)

For each changed region in the commit diff, use word-level diffing (e.g., Google's diff-match-patch) to classify individual tokens:

  1. Machine: Tokens that match a Write/Edit tool output AND were never subsequently modified
  2. Mine: Tokens that appear in a Read result but don't match any prior Write/Edit output — inferred as manual edits
  3. Ours: Tokens that were written by a tool call but show differences at the next Read or at commit time

Step 4: Handle Gaps

  • If the human edits a file and Claude never reads it again before commit → diff last known tool-call state against committed version → differences are "Mine"
  • If content was typed verbatim in a user message and placed via Write → check if the Write content matches the user message text → if yes, "Mine" (human dictated it)

Multi-Session Handling

When two sessions touch the same file in the same commit window:

  1. Non-overlapping ranges: Each session owns its character ranges independently
  2. Overlapping ranges: Default to "Ours" (multiple agents + human = collaborative)
  3. One wrote, one only read: Writing session owns attribution

Detection query:

SELECT file_path, COUNT(DISTINCT session_id) as session_count
FROM session_tool_calls
WHERE file_path IN (files_in_diff)
AND timestamp BETWEEN window_start AND window_end
GROUP BY file_path
HAVING session_count > 1

Plugin Architecture (Claude Code)

Distributed as a Claude Code plugin. Install/uninstall via standard plugin commands.

~/repos/byline/
  .claude-plugin/
    plugin.json              # Plugin manifest
  hooks/
    hooks.json               # Hook definitions using ${CLAUDE_PLUGIN_ROOT}
  skills/
    byline/SKILL.md          # /byline slash command
    byline-status/SKILL.md   # /byline-status diagnostic
  src/
    types.ts                 # Core types
    transcript-parser.ts     # Extract Write/Edit/Read from JSONL
    file-timeline.ts         # Build chronological edit history per file
    word-differ.ts           # Word-level diff + classification
    classifier.ts            # Orchestrator: commit SHA → attribution JSON
    buffer.ts                # Real-time operation buffer
    storage.ts               # .byline/ directory management
    retroactive.ts           # Historical commit analysis
    query.ts                 # Query engine over .byline/ files
    report.ts                # Report generation
  bin/
    byline.ts                # CLI entry point
  hooks/
    post-tool-use.ts         # PostToolUse hook script
    user-prompt-submit.ts    # File snapshot hook
    session-start.sh         # Buffer initialization
    post-commit.sh           # Git post-commit trigger
    install-git-hook.sh      # Auto-install git hook on SessionStart
  package.json
  tsconfig.json
  README.md
  .byline-ignore.default    # Default ignore patterns

Hook lifecycle:

  • PostToolUse → logs every Write/Edit/Read with timestamps and content to .byline/session-log.jsonl
  • UserPromptSubmit → snapshots tracked files to detect manual edits between agent turns
  • Stop → finalizes session data, runs classification engine, writes sidecar JSON

Slash commands:

  • /byline blame <file> — character-level attribution for a file
  • /byline stats [range] — Mine/Machine/Ours breakdown across commits
  • /byline show <sha> — byline data for a specific commit
  • /byline heatmap — file-level overview of who shaped what
  • /byline retro [range] — run retroactive analysis on historical commits

Output Format

.byline/commits/{short-sha}.json:

{
  "commit": "abc12345",
  "timestamp": "2026-02-10T14:30:00-05:00",
  "session_ids": ["uuid-1"],
  "files": {
    "src/app.tsx": {
      "summary": { "mine": 12, "machine": 45, "ours": 8 },
      "segments": [
        {
          "line": 10,
          "col_start": 0,
          "col_end": 45,
          "category": "machine",
          "source": "Write tool call at T1",
          "tool_call_id": "toolu_abc123"
        },
        {
          "line": 10,
          "col_start": 45,
          "col_end": 52,
          "category": "mine",
          "source": "Manual edit detected between T2 and T3"
        },
        {
          "line": 15,
          "col_start": 0,
          "col_end": 80,
          "category": "ours",
          "source": "Edit tool at T4, modified by human before commit"
        }
      ]
    }
  },
  "totals": {
    "mine": 12,
    "machine": 45,
    "ours": 8,
    "percent": { "mine": 18, "machine": 69, "ours": 13 }
  }
}

Retroactive Analysis

retro-analyze.js walks git history and matches commits to session transcripts:

  1. For each historical commit, find sessions active during the commit window
  2. Walk transcript tool calls for files in the diff
  3. Classify using the same Mine/Machine/Ours algorithm
  4. Fidelity notes:
    • No UserPromptSubmit snapshot data → manual edits between tool calls detected only when Claude re-reads the file
    • Gap between last tool call and commit filled by diffing last known state vs. committed version
    • Lower confidence flag on segments where detection relied on inference rather than direct observation

.byline-ignore Defaults

package-lock.json
yarn.lock
bun.lockb
*.min.js
*.min.css
dist/
build/
node_modules/
.git/

Visualization Targets

Commit-level:

  • Byline summary per commit ("18% Mine, 69% Machine, 13% Ours")
  • Character-level blame with mixed attribution per line

Project-level:

  • Authorship over time (stacked area chart)
  • File-level heatmap (Mine/Machine/Ours concentration)
  • Session authorship profiles

Per-file:

  • VS Code gutter with byline attribution
  • PR review with per-hunk attribution tags

Meta/narrative:

  • Collaboration story for feature branches
  • Aggregate portfolio stats across repos

Implementation Phases

Phase 1: Transcript Parser

  • Parse JSONL transcripts
  • Extract Write/Edit/Read events per file
  • Build file timelines

Phase 2: Classification Engine

  • Word-level diff using diff-match-patch
  • Mine/Machine/Ours algorithm at character granularity
  • Handle "user message verbatim" detection
  • Handle "diff between tool calls" gap filling

Phase 3: Plugin & Hooks

  • PostToolUse hook for real-time event logging
  • UserPromptSubmit hook for file snapshots
  • Stop hook for session finalization
  • Slash commands for querying byline data

Phase 4: Retroactive Analysis

  • Historical commit walker
  • Session-to-commit matching
  • Confidence scoring for inferred classifications

Phase 5: Visualization

  • CLI query tools (blame, stats, show, heatmap)
  • File heatmap generation
  • Cross-repo aggregation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment