Ghost_Shell File ID System: Complete Technical Deep Dive

Author: Claude (Sonnet 4.5) Date: 2025-01-24 Project: Ghost_Shell / Somacosf Workspace Purpose: Comprehensive documentation of the semantic file catalog system used across the Somacosf ecosystem

Executive Summary
System Architecture Overview
File ID Schema Specification
Header Format Standards
UUID Generation Workflow
Data Schema Deep Dive
Registry System Architecture
Access & Browse Methods
Bubble Tea TUI Design
Database Schema for File Tracking
CLI Interface Specifications
Implementation Examples
Use Cases & Narratives
Integration Guide
Future Enhancements

Executive Summary

What is the Ghost_Shell File ID System?

TL;DR: Ghost_Shell does NOT use traditional UUIDs (UUID4, UUID5). Instead, it implements a semantic file catalog system called SOM (Somacosf) that embeds rich metadata directly into file headers.

Key Characteristics:

Semantic IDs: Human-readable identifiers like SOM-SCR-0014-v1.0.0
Embedded Metadata: Every file contains a structured header with 12+ metadata fields
Workspace-Level Standard: Defined in CLAUDE.md and enforced across all Somacosf projects
No Separate Registry Database (currently): File IDs are distributed in file headers
Future Integration: Designed to work with tosijs-schema for runtime validation

Why Not Traditional UUIDs?

Traditional UUIDs	SOM File IDs
Cryptographic randomness	Semantic structure
No human meaning	Category-based, sequential
Requires external registry	Self-documenting in file
Version-agnostic	Built-in semver tracking
`550e8400-e29b-41d4-a716-446655440000`	`SOM-SCR-0014-v1.0.0`

Decision Rationale:

UUIDs are excellent for distributed systems requiring uniqueness guarantees
SOM IDs prioritize discoverability, traceability, and AI agent coordination
The workspace operates with a single coordinating agent, making sequential IDs viable
Semantic structure enables pattern-based queries and category filtering

System Architecture Overview

High-Level Architecture

graph TB
    subgraph "File System Layer"
        A[Python Files] -->|Header| B[SOM File ID]
        C[Markdown Files] -->|Header| B
        D[Config Files] -->|Header| B
        E[Scripts] -->|Header| B
    end

    subgraph "Metadata Schema"
        B --> F[file_id: SOM-XXX-NNNN-vX.X.X]
        B --> G[Agent ID: AGENT-XXX-NNN]
        B --> H[Tags: Array]
        B --> I[Timestamps]
        B --> J[Execution Info]
    end

    subgraph "Registry System (Planned)"
        F --> K[SQLite Catalog DB]
        G --> L[Agent Registry]
        H --> M[Tag Index]
        K --> N[File Lookup API]
        L --> N
        M --> N
    end

    subgraph "Access Layer"
        N --> O[CLI Commands]
        N --> P[Bubble Tea TUI]
        N --> Q[tosijs Schema Validation]
    end

    subgraph "Integration Layer"
        O --> R[Search by Category]
        P --> S[Browse by Tags]
        Q --> T[Validate File Metadata]
    end

Component Layers

File System Layer: Physical files with embedded headers
Metadata Schema: Structured data within headers (current implementation)
Registry System: Planned centralized database for fast queries
Access Layer: Tools and interfaces to interact with file catalog
Integration Layer: Cross-project features and validations

File ID Schema Specification

Format Structure

SOM-<CATEGORY>-<SEQUENCE>-v<VERSION>

Where:
- SOM        = Somacosf namespace (3 chars, fixed)
- CATEGORY   = File type code (3 chars, uppercase)
- SEQUENCE   = Unique number (4 digits, zero-padded)
- VERSION    = Semantic version (semver: MAJOR.MINOR.PATCH)

Category Codes

Code	Category	Description	Example File Types
CMD	Slash commands	Claude Code slash commands	`.md` files in `.claude/commands/`
SCR	Scripts	Executable Python/PowerShell	`.py`, `.ps1`
DOC	Documentation	Project documentation	`.md` (README, guides)
CFG	Configuration	Config files	`.yaml`, `.json`, `.toml`
REG	Registry files	Registry-related files	Registry schemas, catalogs
TST	Tests	Test suites	`test_.py`, `_test.py`
TMP	Templates	Project templates	Boilerplate files
DTA	Data/schemas	Database schemas, data	`.sql`, `.json` schemas
LOG	Logs/diaries	Development logs	`development_diary.md`

Sequence Number Allocation

Strategy: Sequential allocation within category scope

Rules:

Sequences start at 0001 for each category
Numbers are never reused (even after file deletion)
Allocation is manual (no auto-increment system yet)
Gaps in sequences are acceptable (e.g., 0001, 0002, 0005)

Example Progression:

SOM-SCR-0001-v1.0.0  # First script
SOM-SCR-0002-v1.0.0  # Second script
SOM-SCR-0003-v1.0.0  # Third script (later deleted)
SOM-SCR-0004-v1.0.0  # Fourth script
# Next allocation: SOM-SCR-0005 (0003 is NOT reused)

Version Semantics

Following Semantic Versioning 2.0.0 (semver.org):

MAJOR.MINOR.PATCH

MAJOR: Incompatible API changes, breaking refactors
MINOR: New features, backward-compatible additions
PATCH: Bug fixes, documentation updates

Version Bump Examples:

File header metadata added → PATCH bump (v1.0.0 → v1.0.1)
New function added to module → MINOR bump (v1.0.1 → v1.1.0)
Function signature changed → MAJOR bump (v1.1.0 → v2.0.0)

Version in File ID:

The version in the file_id line should match the version field
Both should be updated simultaneously when file is modified

Header Format Standards

Python File Header (Canonical Form)

# ==============================================================================
# file_id: SOM-SCR-0014-v1.0.0
# name: cli.py
# description: Ghost_Shell unified management CLI
# project_id: GHOST-SHELL
# category: script
# tags: [cli, management, admin, opentelemetry]
# created: 2025-11-23
# modified: 2025-11-23
# version: 1.0.0
# agent_id: AGENT-CLAUDE-002
# execution: python -m ghost_shell.cli [command] [args]
# ==============================================================================

Location: Lines 1-13 of every Python file Format: Hash comments with key: value pairs Line Count: Exactly 13 lines (including delimiters)

Markdown File Header (Canonical Form)

<!--
===============================================================================
file_id: SOM-DOC-0010-v1.0.0
name: TOSIJS_INTEGRATION.md
description: tosijs-schema integration plan for Ghost_Shell
project_id: GHOST-SHELL
category: documentation
tags: [tosijs, schema, validation, integration, planning]
created: 2025-11-23
modified: 2025-11-23
version: 1.0.0
agent:
  id: AGENT-CLAUDE-002
  name: claude_takeover
  model: claude-sonnet-4-5-20250929
execution:
  type: documentation
  invocation: Read for project understanding
===============================================================================
-->

Location: Lines 1-20 of every Markdown file Format: HTML comment block with YAML-like structure Line Count: 19-21 lines (variable based on nested fields)

Note: Markdown headers use nested YAML syntax for agent and execution fields.

YAML/Config File Header (Canonical Form)

# ==============================================================================
# file_id: SOM-CFG-0002-v1.0.0
# name: config.yaml
# description: Ghost_Shell unified configuration
# project_id: GHOST-SHELL
# category: configuration
# tags: [config, yaml, settings]
# created: 2025-11-23
# modified: 2025-11-23
# version: 1.0.0
# ==============================================================================

Location: Lines 1-11 Format: YAML comments (simpler than Markdown, no nested fields) Line Count: 11 lines

PowerShell Script Header (Canonical Form)

# ==============================================================================
# file_id: SOM-SCR-NNNN-vX.X.X
# name: script.ps1
# description: What this script does
# project_id: PROJECT-TYPE
# category: script
# tags: [tag1, tag2]
# created: YYYY-MM-DD
# modified: YYYY-MM-DD
# version: X.X.X
# agent_id: AGENT-XXX-NNN
# execution: .\script.ps1 [-Param value]
# ==============================================================================

Same structure as Python, but uses PowerShell comment syntax.

UUID Generation Workflow

Current Implementation (Manual)

Workflow Diagram:

sequenceDiagram
    participant Agent as Claude Agent
    participant FS as File System
    participant Dev as Development Diary

    Agent->>Agent: 1. Determine file purpose
    Agent->>Agent: 2. Select category code (SCR, DOC, CFG, etc.)
    Agent->>FS: 3. Search existing files for highest sequence
    Note over FS: grep "file_id: SOM-SCR" *.py | sort
    FS-->>Agent: 4. Return highest sequence (e.g., 0013)
    Agent->>Agent: 5. Increment sequence (0013 → 0014)
    Agent->>Agent: 6. Set version to v1.0.0 (new file)
    Agent->>Agent: 7. Assign agent_id (AGENT-CLAUDE-002)
    Agent->>Agent: 8. Generate file_id: SOM-SCR-0014-v1.0.0
    Agent->>FS: 9. Create file with header
    Agent->>Dev: 10. Log file creation in diary

Manual Allocation Steps

Step-by-Step Process:

Determine Category

# Is this a script? → SCR
# Is this documentation? → DOC
# Is this a test? → TST

Find Highest Sequence in Category

# For scripts:
grep -r "file_id: SOM-SCR" . | sed 's/.*SOM-SCR-\([0-9]*\).*/\1/' | sort -n | tail -1
# Output: 0013

Increment Sequence

last_sequence = 13
new_sequence = last_sequence + 1  # 14
formatted = f"{new_sequence:04d}"  # "0014"

Construct File ID

file_id = f"SOM-SCR-{formatted}-v1.0.0"
# Result: SOM-SCR-0014-v1.0.0

Generate Complete Header

header = f"""# ==============================================================================

file_id: {file_id}

name: {filename}

description: {description}

project_id: GHOST-SHELL

category: script

tags: {tags}

created: {today}

modified: {today}

version: 1.0.0

agent_id: AGENT-CLAUDE-002

execution: {execution_command}

==============================================================================

"""


6. **Write File**
```python
with open(filepath, 'w') as f:
    f.write(header)
    f.write(code_content)

Proposed Automated Workflow (Future)

graph LR
    A[Create New File] --> B{File Type?}
    B -->|.py| C[Scan Python Files]
    B -->|.md| D[Scan Markdown Files]
    B -->|.yaml| E[Scan Config Files]

    C --> F[Get Highest SCR Sequence]
    D --> G[Get Highest DOC Sequence]
    E --> H[Get Highest CFG Sequence]

    F --> I[Increment & Format]
    G --> I
    H --> I

    I --> J[Query Agent Registry]
    J --> K[Get Current Agent ID]

    K --> L[Generate File ID]
    L --> M[Create Header Template]
    M --> N[Insert into File]
    N --> O[Register in Catalog DB]
    O --> P[Log to Diary]

Automation Tool (Proposed):

# Command-line tool for ID generation
ghost-catalog generate --file new_module.py --category script --description "New feature module"

# Output:
# Generated: SOM-SCR-0015-v1.0.0
# Header inserted into new_module.py
# Registered in catalog database

Data Schema Deep Dive

Complete Metadata Field Specification

Field	Type	Required	Description	Example
file_id	String	Yes	Unique semantic identifier	`SOM-SCR-0014-v1.0.0`
name	String	Yes	Actual filename (must match)	`cli.py`
description	String	Yes	One-line purpose statement	`Ghost_Shell unified management CLI`
project_id	String	Yes	Parent project identifier	`GHOST-SHELL`
category	String	Yes	Category code (3 chars)	`script`
tags	Array	Yes	Semantic tags for indexing	`[cli, management, admin]`
created	Date	Yes	ISO 8601 date (YYYY-MM-DD)	`2025-11-23`
modified	Date	Yes	Last modification date	`2025-11-23`
version	String	Yes	Semantic version	`1.0.0`
agent_id	String	Yes	Creating/modifying agent	`AGENT-CLAUDE-002`
agent.name	String	Optional	Agent name (MD only)	`claude_takeover`
agent.model	String	Optional	LLM model (MD only)	`claude-sonnet-4-5-20250929`
execution	String	Yes	How to run/use file	`python -m ghost_shell.cli stats`

Field Validation Rules

file_id:

Pattern: ^SOM-[A-Z]{3}-\d{4}-v\d+\.\d+\.\d+$
Category: Must match one of the defined category codes
Sequence: Must be unique within category
Version: Must match version field

name:

Constraint: Must exactly match the filename
Case-sensitive: CLI.py ≠ cli.py
Extension: Must include file extension

description:

Length: 1-100 characters recommended
Style: Imperative mood, concise
Good: Unified management CLI for Ghost_Shell
Bad: This file contains a CLI that manages things

project_id:

Format: [A-Z-]+ (uppercase with hyphens)
Examples: GHOST-SHELL, DMBT, BROWSER-MIXER
Scope: Project-level identifier (not workspace)

category:

Values: One of script, documentation, configuration, test, template, data, log
Consistency: Should align with file_id category code

tags:

Format: Array of lowercase strings
Separator: Commas in array notation [tag1, tag2, tag3]
Style: Use hyphens for multi-word tags: multi-word-tag
Purpose: Enable semantic search and classification

created / modified:

Format: YYYY-MM-DD (ISO 8601 date only, no time)
created: Never changes after initial creation
modified: Updated whenever file content changes

version:

Format: MAJOR.MINOR.PATCH (semver)
Constraints: Must not have leading zeros (e.g., 1.01.0 is invalid)
Sync: Must match version in file_id

agent_id:

Format: AGENT-[A-Z]+-\d{3}
Registry: Should reference a defined agent in agent_registry.md
Examples: AGENT-CLAUDE-002, AGENT-PRIME-001

execution:

Purpose: Provides runnable command or usage instruction
Format: Actual shell command or description
Examples:
- Scripts: python -m ghost_shell.cli stats
- Docs: Read for project understanding
- Configs: Loaded by launcher.py on startup

Tag Taxonomy

Recommended Tag Categories:

Functional Tags: What the file does
- cli, gui, tui, api, database, networking
Technology Tags: What it uses
- opentelemetry, sqlite, mitmproxy, textual, tosijs
Purpose Tags: Why it exists
- admin, debugging, monitoring, security, privacy
Domain Tags: What area it covers
- proxy, firewall, intelligence, analytics, telemetry
Status Tags: Implementation state
- wip, deprecated, experimental, stable

Example Tagging:

# For ghost_shell/cli.py:
tags: [cli, management, admin, opentelemetry, statistics]

# For ghost_shell/proxy/blocker.py:
tags: [proxy, blocking, privacy, security, mitmproxy]

# For development_diary.md:
tags: [log, diary, development, documentation]

Registry System Architecture

Current State: Distributed File Headers

Architecture:

graph TB
    subgraph "File System (Current)"
        A[cli.py<br/>SOM-SCR-0014]
        B[launcher.py<br/>SOM-SCR-0015]
        C[telemetry.py<br/>SOM-SCR-0010]
        D[config.yaml<br/>SOM-CFG-0002]
    end

    subgraph "Query Methods"
        E[grep + awk + sort]
        F[repomix output]
        G[Manual file inspection]
    end

    A --> E
    B --> E
    C --> E
    D --> E

    A --> F
    B --> F
    C --> F
    D --> F

    A --> G
    B --> G
    C --> G
    D --> G

    E --> H[File ID List]
    F --> H
    G --> H

Characteristics:

✅ No database dependency: Works immediately
✅ Git-trackable: Headers are version-controlled
✅ Self-documenting: Metadata travels with file
❌ Slow queries: Must scan all files for searches
❌ No indexing: Can't efficiently search by tags
❌ No relationships: Can't link files (dependencies, etc.)

Proposed State: Centralized Catalog Database

Architecture:

graph TB
    subgraph "File System"
        A[Files with Headers]
    end

    subgraph "Catalog Database (SQLite)"
        B[(file_catalog table)]
        C[(agent_registry table)]
        D[(file_tags table)]
        E[(file_dependencies table)]
    end

    subgraph "Sync Layer"
        F[Catalog Scanner]
        G[File Watcher]
    end

    subgraph "Access APIs"
        H[CLI: ghost-catalog]
        I[TUI: Bubble Tea Browser]
        J[Python API: CatalogQuery]
    end

    A --> F
    F --> B
    F --> C
    F --> D

    A --> G
    G --> B

    B --> H
    C --> H
    D --> H

    B --> I
    C --> I
    D --> I

    B --> J
    C --> J
    D --> J

Database Schema (Proposed):

-- Core catalog table
CREATE TABLE file_catalog (
    file_id TEXT PRIMARY KEY,              -- SOM-XXX-NNNN-vX.X.X
    name TEXT NOT NULL,                    -- Filename
    path TEXT NOT NULL,                    -- Relative or absolute path
    description TEXT,
    project_id TEXT,
    category TEXT,                         -- SOM category code
    created DATE,
    modified DATE,
    version TEXT,                          -- Semantic version
    agent_id TEXT,
    execution TEXT,
    checksum TEXT,                         -- SHA256 hash for integrity
    last_scanned TIMESTAMP,                -- When catalog was last updated
    FOREIGN KEY (agent_id) REFERENCES agent_registry(id)
);

-- Tag index (many-to-many)
CREATE TABLE file_tags (
    file_id TEXT,
    tag TEXT,
    PRIMARY KEY (file_id, tag),
    FOREIGN KEY (file_id) REFERENCES file_catalog(file_id)
);

-- Agent registry
CREATE TABLE agent_registry (
    id TEXT PRIMARY KEY,                   -- AGENT-XXX-NNN
    name TEXT,
    model TEXT,
    first_seen TIMESTAMP,
    last_active TIMESTAMP
);

-- File dependencies (optional, for advanced tracking)
CREATE TABLE file_dependencies (
    file_id TEXT,
    depends_on_file_id TEXT,
    dependency_type TEXT,                  -- import, config, data, etc.
    PRIMARY KEY (file_id, depends_on_file_id),
    FOREIGN KEY (file_id) REFERENCES file_catalog(file_id),
    FOREIGN KEY (depends_on_file_id) REFERENCES file_catalog(file_id)
);

-- Indexes for fast queries
CREATE INDEX idx_category ON file_catalog(category);
CREATE INDEX idx_project ON file_catalog(project_id);
CREATE INDEX idx_agent ON file_catalog(agent_id);
CREATE INDEX idx_modified ON file_catalog(modified);
CREATE INDEX idx_tags ON file_tags(tag);

Catalog Synchronization

Process:

sequenceDiagram
    participant FS as File System
    participant Scanner as Catalog Scanner
    participant DB as Catalog Database
    participant Log as Sync Log

    Scanner->>FS: 1. Glob all files (**/*.py, **/*.md, etc.)
    FS-->>Scanner: 2. Return file list

    loop For each file
        Scanner->>FS: 3. Read file header (first 20 lines)
        FS-->>Scanner: 4. Return header content
        Scanner->>Scanner: 5. Parse header metadata
        Scanner->>Scanner: 6. Calculate SHA256 checksum
        Scanner->>DB: 7. SELECT checksum WHERE file_id = ?

        alt File exists and unchanged
            DB-->>Scanner: Checksum matches
            Scanner->>Scanner: Skip (no update needed)
        else File new or modified
            DB-->>Scanner: Checksum differs or NULL
            Scanner->>DB: 8. UPSERT file_catalog
            Scanner->>DB: 9. DELETE old tags, INSERT new tags
            Scanner->>Log: 10. Log update
        end
    end

    Scanner->>DB: 11. Mark stale entries (files deleted)
    Scanner->>Log: 12. Generate sync report

Command:

# Scan and update catalog
ghost-catalog sync

# Output:
# Scanned: 50 files
# Updated: 3 files
# New: 1 file
# Unchanged: 46 files
# Stale: 0 files
# Sync completed in 1.2s

Access & Browse Methods

Method 1: File Header Inspection (Current)

Bash Commands:

# List all file IDs
grep -r "file_id:" . --include="*.py" --include="*.md" | awk '{print $2}'

# List by category
grep -r "file_id: SOM-SCR" . --include="*.py" | awk '{print $2}'

# Find file by ID
grep -r "file_id: SOM-SCR-0014" .

# List all tags
grep -r "tags:" . --include="*.py" --include="*.md" | sed 's/.*tags: //' | tr '[],' '\n' | sort -u

# Find files with specific tag
grep -r "tags:.*opentelemetry" . --include="*.py"

PowerShell Commands:

# List all file IDs
Select-String -Path .\**\*.py,.\**\*.md -Pattern "file_id:" | ForEach-Object { ($_ -split ' ')[1] }

# Count files by category
Select-String -Path .\**\*.py -Pattern "file_id:" |
    ForEach-Object { ($_ -split '-')[1] } |
    Group-Object |
    Select-Object Name, Count

# Find newest modified files
Select-String -Path .\**\*.py,.\**\*.md -Pattern "modified:" |
    ForEach-Object {
        $date = ($_ -split ' ')[1]
        $file = $_.Path
        [PSCustomObject]@{Date=$date; File=$file}
    } |
    Sort-Object -Property Date -Descending |
    Select-Object -First 10

Method 2: Repomix Output Analysis

Generate Repomix:

repomix --output catalog.txt --style xml

Parse Repomix:

import re

def parse_file_ids_from_repomix(repomix_path):
    """Extract all file IDs from repomix output"""
    pattern = r'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)'

    with open(repomix_path, 'r') as f:
        content = f.read()

    matches = re.findall(pattern, content)
    return list(set(matches))  # Unique file IDs

# Usage
file_ids = parse_file_ids_from_repomix('ghost_shell_repomix_output.txt')
print(f"Found {len(file_ids)} files with catalog headers")

Method 3: CLI Tool (Proposed)

Command Reference:

# List all cataloged files
ghost-catalog list

# Filter by category
ghost-catalog list --category script

# Search by tag
ghost-catalog search --tag opentelemetry

# Get file details
ghost-catalog info SOM-SCR-0014-v1.0.0

# Validate all headers
ghost-catalog validate

# Fix inconsistencies
ghost-catalog validate --fix

# Generate new file with header
ghost-catalog generate --file new_module.py --category script --description "New module"

# Show catalog statistics
ghost-catalog stats

# Export catalog to JSON
ghost-catalog export catalog.json

Example Output:

$ ghost-catalog list --category script

╭─────────────────────────────────────────────────────────────╮
│               Ghost_Shell File Catalog (Scripts)            │
├──────────────────────────┬──────────────────────────────────┤
│ File ID                  │ Description                      │
├──────────────────────────┼──────────────────────────────────┤
│ SOM-SCR-0010-v1.0.0      │ OpenTelemetry setup              │
│ SOM-SCR-0011-v1.0.0      │ Traffic blocking                 │
│ SOM-SCR-0012-v1.1.0      │ Main proxy addon                 │
│ SOM-SCR-0013-v1.0.0      │ Intelligence collector           │
│ SOM-SCR-0014-v1.0.0      │ Management CLI                   │
│ SOM-SCR-0015-v1.1.0      │ Orchestrator                     │
├──────────────────────────┴──────────────────────────────────┤
│ Total: 6 scripts                                            │
╰─────────────────────────────────────────────────────────────╯

Method 4: Database Queries (Future)

-- Find all files modified in last 7 days
SELECT file_id, name, modified
FROM file_catalog
WHERE modified >= date('now', '-7 days')
ORDER BY modified DESC;

-- Count files by category
SELECT category, COUNT(*) as count
FROM file_catalog
GROUP BY category
ORDER BY count DESC;

-- Find files by tag
SELECT fc.file_id, fc.name, fc.description
FROM file_catalog fc
JOIN file_tags ft ON fc.file_id = ft.file_id
WHERE ft.tag = 'opentelemetry';

-- Find files by agent
SELECT file_id, name, version
FROM file_catalog
WHERE agent_id = 'AGENT-CLAUDE-002'
ORDER BY modified DESC;

-- Find outdated versions (MAJOR version < 1)
SELECT file_id, name, version
FROM file_catalog
WHERE CAST(substr(version, 1, instr(version, '.') - 1) AS INTEGER) < 1;

-- File dependency graph
WITH RECURSIVE deps AS (
    SELECT file_id, depends_on_file_id, 1 as depth
    FROM file_dependencies
    WHERE file_id = 'SOM-SCR-0014-v1.0.0'

    UNION ALL

    SELECT fd.file_id, fd.depends_on_file_id, deps.depth + 1
    FROM file_dependencies fd
    JOIN deps ON fd.file_id = deps.depends_on_file_id
    WHERE depth < 5
)
SELECT DISTINCT fc.file_id, fc.name, deps.depth
FROM deps
JOIN file_catalog fc ON deps.depends_on_file_id = fc.file_id
ORDER BY depth, fc.file_id;

Bubble Tea TUI Design

Overview

Bubble Tea (https://github.com/charmbracelet/bubbletea) is a Go framework for building terminal UIs. For Ghost_Shell, we'll design a TUI for browsing the file catalog.

Why Bubble Tea?

Modern, composable TUI framework
Rich text rendering (via Lip Gloss)
Mouse support, interactive tables
Cross-platform (Windows, Linux, macOS)

Implementation Language: Go (Bubble Tea is Go-native)

Architecture

graph TB
    subgraph "Bubble Tea TUI"
        A[Main View] --> B[Navigation Sidebar]
        A --> C[Content Pane]

        B --> D[Category Filter]
        B --> E[Tag Filter]
        B --> F[Search Box]

        C --> G[File List Table]
        C --> H[File Detail View]
        C --> I[Dependency Graph]
    end

    subgraph "Data Layer"
        J[SQLite Catalog DB]
        K[File System]
    end

    G --> J
    H --> K
    I --> J

UI Mockup

╔════════════════════════════════════════════════════════════════════════════╗
║                   Ghost_Shell File Catalog Browser v1.0                    ║
╠══════════════════╦═════════════════════════════════════════════════════════╣
║  FILTERS         ║  FILE LIST (25 files)                                   ║
║                  ║ ┌────────────────────┬────────────────────────────────┐ ║
║ Categories       ║ │ File ID            │ Description                    │ ║
║ [●] Scripts (6)  ║ ├────────────────────┼────────────────────────────────┤ ║
║ [ ] Docs (4)     ║ │ SOM-SCR-0010-v1.0.0│ OpenTelemetry setup            │ ║
║ [ ] Config (2)   ║ │ SOM-SCR-0011-v1.0.0│ Traffic blocking               │ ║
║ [ ] Tests (1)    ║ │►SOM-SCR-0012-v1.1.0│ Main proxy addon               │ ║
║                  ║ │ SOM-SCR-0013-v1.0.0│ Intelligence collector         │ ║
║ Tags             ║ │ SOM-SCR-0014-v1.0.0│ Management CLI                 │ ║
║ [x] opentelemetry║ │ SOM-SCR-0015-v1.1.0│ Orchestrator                   │ ║
║ [ ] proxy        ║ └────────────────────┴────────────────────────────────┘ ║
║ [ ] cli          ║                                                         ║
║                  ║  FILE DETAILS: SOM-SCR-0012-v1.1.0                      ║
║ Search           ║ ┌─────────────────────────────────────────────────────┐ ║
║ [proxy______]    ║ │ Name: core.py                                       │ ║
║                  ║ │ Path: ghost_shell/proxy/core.py                     │ ║
║ Sort By          ║ │ Description: Main proxy addon                       │ ║
║ ● Modified       ║ │ Project: GHOST-SHELL                                │ ║
║ ○ Created        ║ │ Category: script                                    │ ║
║ ○ File ID        ║ │ Tags: [proxy, mitmproxy, core]                      │ ║
║                  ║ │ Version: 1.1.0                                      │ ║
║ Actions          ║ │ Agent: AGENT-CLAUDE-002                             │ ║
║ [o] Open in $ED  ║ │ Created: 2025-11-23  Modified: 2025-11-23          │ ║
║ [c] Copy path    ║ │ Execution: python -s ghost_shell/proxy/core.py      │ ║
║ [d] Dependencies ║ │ Checksum: 3a5f7c... (verified)                      │ ║
║ [g] Git log      ║ └─────────────────────────────────────────────────────┘ ║
║                  ║                                                         ║
╠══════════════════╩═════════════════════════════════════════════════════════╣
║ [?] Help  [/] Search  [Tab] Switch Pane  [q] Quit  [Enter] Select          ║
╚════════════════════════════════════════════════════════════════════════════╝

Component Breakdown

1. Navigation Sidebar (Left, 25% width)

Category Filters: Checkboxes for SCR, DOC, CFG, TST
Tag Filters: Checkboxes for common tags (top 10)
Search Box: Live filter by file_id, name, or description
Sort Options: Radio buttons for sort order
Action Buttons: Quick actions (open, copy, etc.)

2. File List Table (Top-right, 75% width, 60% height)

Columns: File ID, Description (truncated if needed)
Selection: Arrow keys + Enter to select
Highlight: Selected row highlighted
Pagination: If >20 files, show page indicator

3. File Detail View (Bottom-right, 75% width, 40% height)

Metadata Display: All header fields in readable format
Checksum Verification: Shows if file matches catalog
Action Hints: Keybindings for common actions

4. Status Bar (Bottom)

Keybindings: Quick reference for navigation
Status: Current filter/search state

Keybindings

Key	Action	Description
`↑`/`↓`	Navigate list	Move selection up/down
`Enter`	Select file	Show details in detail pane
`Tab`	Switch pane	Toggle between sidebar and file list
`/`	Search	Focus search box
`Esc`	Clear filter	Reset all filters
`o`	Open in editor	Open file in $EDITOR
`c`	Copy path	Copy file path to clipboard
`d`	Show dependencies	Open dependency graph view
`g`	Git log	Show git history for file
`v`	Validate	Re-validate file header
`r`	Refresh	Re-scan catalog database
`?`	Help	Show full help screen
`q`	Quit	Exit application

Implementation (Go + Bubble Tea)

File Structure:

ghost-catalog-tui/
├── main.go                 # Entry point
├── model.go                # Bubble Tea model
├── update.go               # Update logic
├── view.go                 # Rendering logic
├── components/
│   ├── sidebar.go          # Sidebar component
│   ├── filelist.go         # File list table
│   ├── details.go          # Detail pane
│   └── statusbar.go        # Status bar
├── database/
│   └── catalog.go          # SQLite queries
└── styles/
    └── theme.go            # Lip Gloss styles

Sample Code (main.go):

package main

import (
    "database/sql"
    "fmt"
    "log"
    "os"

    tea "github.com/charmbracelet/bubbletea"
    _ "github.com/mattn/go-sqlite3"
)

type model struct {
    db            *sql.DB
    files         []File
    selectedIndex int
    filterCategory string
    filterTags    []string
    searchQuery   string
    focusedPane   string // "sidebar" or "filelist"
}

type File struct {
    FileID      string
    Name        string
    Description string
    Path        string
    Category    string
    Tags        []string
    Version     string
    Modified    string
}

func initialModel() model {
    db, err := sql.Open("sqlite3", "./data/catalog.db")
    if err != nil {
        log.Fatal(err)
    }

    return model{
        db:          db,
        focusedPane: "filelist",
    }
}

func (m model) Init() tea.Cmd {
    return loadFiles(m.db)
}

func (m model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
    switch msg := msg.(type) {
    case tea.KeyMsg:
        switch msg.String() {
        case "q", "ctrl+c":
            return m, tea.Quit
        case "up":
            if m.selectedIndex > 0 {
                m.selectedIndex--
            }
        case "down":
            if m.selectedIndex < len(m.files)-1 {
                m.selectedIndex++
            }
        case "enter":
            // Show file details
        case "/":
            // Focus search box
        }
    case filesLoadedMsg:
        m.files = msg.files
    }
    return m, nil
}

func (m model) View() string {
    // Render UI (see view.go implementation)
    return renderUI(m)
}

func main() {
    p := tea.NewProgram(initialModel(), tea.WithAltScreen())
    if err := p.Start(); err != nil {
        fmt.Printf("Error: %v", err)
        os.Exit(1)
    }
}

Database Queries (database/catalog.go):

package database

import (
    "database/sql"
)

func QueryFilesByCategoryAndTags(db *sql.DB, category string, tags []string) ([]File, error) {
    query := `
        SELECT DISTINCT fc.file_id, fc.name, fc.description, fc.path, fc.category, fc.version, fc.modified
        FROM file_catalog fc
        LEFT JOIN file_tags ft ON fc.file_id = ft.file_id
        WHERE 1=1
    `

    args := []interface{}{}

    if category != "" {
        query += " AND fc.category = ?"
        args = append(args, category)
    }

    if len(tags) > 0 {
        query += " AND ft.tag IN ("
        for i, tag := range tags {
            if i > 0 {
                query += ","
            }
            query += "?"
            args = append(args, tag)
        }
        query += ")"
    }

    query += " ORDER BY fc.modified DESC"

    rows, err := db.Query(query, args...)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var files []File
    for rows.Next() {
        var f File
        err := rows.Scan(&f.FileID, &f.Name, &f.Description, &f.Path, &f.Category, &f.Version, &f.Modified)
        if err != nil {
            return nil, err
        }
        files = append(files, f)
    }

    return files, nil
}

Database Schema for File Tracking

(See "Registry System Architecture" section above for complete schema)

Summary Tables:

file_catalog: Core metadata for all cataloged files
file_tags: Tag index (many-to-many)
agent_registry: Agent metadata and activity tracking
file_dependencies: Relationships between files

Sample Data:

-- Insert file
INSERT INTO file_catalog VALUES (
    'SOM-SCR-0014-v1.0.0',
    'cli.py',
    'ghost_shell/cli.py',
    'Ghost_Shell unified management CLI',
    'GHOST-SHELL',
    'script',
    '2025-11-23',
    '2025-11-23',
    '1.0.0',
    'AGENT-CLAUDE-002',
    'python -m ghost_shell.cli [command] [args]',
    '3a5f7c9d...',
    '2025-11-24 10:30:00'
);

-- Insert tags
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'cli');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'management');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'admin');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'opentelemetry');

-- Insert agent
INSERT INTO agent_registry VALUES (
    'AGENT-CLAUDE-002',
    'claude_takeover',
    'claude-sonnet-4-5-20250929',
    '2025-11-23 00:00:00',
    '2025-11-24 10:30:00'
);

-- Insert dependency (cli.py depends on db_handler.py)
INSERT INTO file_dependencies VALUES (
    'SOM-SCR-0014-v1.0.0',
    'SOM-SCR-XXXX-vX.X.X',  -- db_handler.py file_id
    'import'
);

CLI Interface Specifications

Command Structure

ghost-catalog <command> [options]

Commands:
  list         List all cataloged files
  search       Search files by criteria
  info         Show detailed information about a file
  validate     Validate file headers
  generate     Generate new file with catalog header
  sync         Synchronize catalog database with file system
  stats        Show catalog statistics
  export       Export catalog to JSON/CSV
  tags         Manage tags
  agents       List agents in registry

Global Options:
  --db PATH    Path to catalog database (default: ./data/catalog.db)
  --verbose    Show detailed output
  --help       Show help

Command Details

ghost-catalog list

ghost-catalog list [options]

Options:
  --category CATEGORY   Filter by category (script, documentation, etc.)
  --tag TAG            Filter by tag (can be specified multiple times)
  --project PROJECT    Filter by project ID
  --agent AGENT        Filter by agent ID
  --sort FIELD         Sort by field (modified, created, file_id)
  --format FORMAT      Output format (table, json, csv)

Examples:
  ghost-catalog list --category script
  ghost-catalog list --tag opentelemetry --tag cli
  ghost-catalog list --project GHOST-SHELL --sort modified

ghost-catalog search

ghost-catalog search QUERY [options]

Arguments:
  QUERY                Search query (searches file_id, name, description)

Options:
  --category CATEGORY  Filter by category
  --tag TAG           Filter by tag
  --fuzzy             Enable fuzzy matching

Examples:
  ghost-catalog search "proxy"
  ghost-catalog search "telemetry" --tag opentelemetry
  ghost-catalog search "SOM-SCR-001" --fuzzy

ghost-catalog info

ghost-catalog info FILE_ID

Arguments:
  FILE_ID             File ID to show details for

Examples:
  ghost-catalog info SOM-SCR-0014-v1.0.0

ghost-catalog validate

ghost-catalog validate [options]

Options:
  --fix               Auto-fix minor issues (sync version, update modified date)
  --file FILE         Validate specific file
  --strict            Enable strict validation (fail on warnings)

Examples:
  ghost-catalog validate
  ghost-catalog validate --fix
  ghost-catalog validate --file ghost_shell/cli.py --strict

ghost-catalog generate

ghost-catalog generate --file FILE --category CATEGORY [options]

Options:
  --file FILE          File to create/update
  --category CATEGORY  Category code (script, documentation, etc.)
  --description DESC   File description
  --tags TAG1,TAG2    Comma-separated tags
  --project PROJECT   Project ID (default: current project)
  --agent AGENT       Agent ID (default: current agent)

Examples:
  ghost-catalog generate --file new_module.py --category script --description "New feature module" --tags cli,admin

ghost-catalog sync

ghost-catalog sync [options]

Options:
  --path PATH         Path to scan (default: current directory)
  --recursive         Scan recursively (default: true)
  --dry-run          Show what would be updated without making changes

Examples:
  ghost-catalog sync
  ghost-catalog sync --path ./ghost_shell --dry-run

ghost-catalog stats

ghost-catalog stats

Examples:
  ghost-catalog stats

Output:

╭─────────────────────────────────────╮
│     Ghost_Shell Catalog Stats       │
├─────────────────────────────────────┤
│ Total Files:             25         │
│ Scripts:                 6          │
│ Documentation:           4          │
│ Configuration:           2          │
│ Tests:                   1          │
│                                     │
│ Total Tags:              42         │
│ Most Used Tag:           opentelemetry (8 files) │
│                                     │
│ Total Agents:            2          │
│ Most Active Agent:       AGENT-CLAUDE-002 (20 files) │
│                                     │
│ Last Sync:               2025-11-24 10:30:00       │
│ Database Size:           245 KB     │
╰─────────────────────────────────────╯

ghost-catalog export

ghost-catalog export OUTPUT [options]

Arguments:
  OUTPUT              Output file path

Options:
  --format FORMAT     Export format (json, csv, yaml)
  --category CATEGORY Filter by category
  --tag TAG          Filter by tag

Examples:
  ghost-catalog export catalog.json
  ghost-catalog export catalog.csv --format csv --category script

Implementation Examples

Example 1: Python Script to Extract File IDs

#!/usr/bin/env python3
"""
Extract all file IDs from a codebase using header parsing.
"""

import re
from pathlib import Path
from typing import Dict, List

def extract_file_id_from_content(content: str) -> Dict[str, str]:
    """Parse file header and extract all metadata fields."""
    metadata = {}

    # Patterns for different header styles
    patterns = {
        'file_id': r'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)',
        'name': r'name:\s+(.+)',
        'description': r'description:\s+(.+)',
        'category': r'category:\s+(\w+)',
        'version': r'version:\s+([\d.]+)',
        'agent_id': r'agent_id:\s+(AGENT-[A-Z]+-\d{3})',
        'created': r'created:\s+(\d{4}-\d{2}-\d{2})',
        'modified': r'modified:\s+(\d{4}-\d{2}-\d{2})',
    }

    for key, pattern in patterns.items():
        match = re.search(pattern, content)
        if match:
            metadata[key] = match.group(1).strip()

    # Extract tags (array format)
    tags_match = re.search(r'tags:\s+\[([^\]]+)\]', content)
    if tags_match:
        tags_str = tags_match.group(1)
        metadata['tags'] = [tag.strip() for tag in tags_str.split(',')]

    return metadata

def scan_directory(root_path: Path, extensions: List[str]) -> List[Dict]:
    """Scan directory for files with catalog headers."""
    results = []

    for ext in extensions:
        for filepath in root_path.rglob(f'*{ext}'):
            # Skip virtual environments and cache directories
            if any(part in str(filepath) for part in ['.venv', '__pycache__', 'node_modules']):
                continue

            try:
                with open(filepath, 'r', encoding='utf-8') as f:
                    # Read first 30 lines (headers should be within this range)
                    header_content = ''.join(f.readlines()[:30])

                metadata = extract_file_id_from_content(header_content)

                if metadata.get('file_id'):
                    metadata['path'] = str(filepath.relative_to(root_path))
                    results.append(metadata)
            except Exception as e:
                print(f"Error reading {filepath}: {e}")

    return results

if __name__ == '__main__':
    import json

    # Scan current directory
    root = Path('.')
    extensions = ['.py', '.md', '.yaml', '.yml', '.ps1']

    files = scan_directory(root, extensions)

    print(f"Found {len(files)} cataloged files\n")

    # Group by category
    by_category = {}
    for file in files:
        category = file.get('category', 'unknown')
        if category not in by_category:
            by_category[category] = []
        by_category[category].append(file)

    # Print summary
    for category, items in sorted(by_category.items()):
        print(f"\n{category.upper()} ({len(items)} files):")
        for item in items:
            print(f"  {item['file_id']:25} {item.get('description', 'No description')[:50]}")

    # Export to JSON
    with open('catalog_export.json', 'w') as f:
        json.dump(files, f, indent=2)

    print(f"\n✓ Catalog exported to catalog_export.json")

Example 2: PowerShell Script to Validate Headers

# validate_headers.ps1
# Validate all file catalog headers for consistency

param(
    [string]$Path = ".",
    [switch]$Fix
)

function Test-FileHeader {
    param([string]$FilePath)

    $content = Get-Content $FilePath -TotalCount 20 -Raw
    $errors = @()

    # Extract metadata
    if ($content -match 'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)') {
        $fileId = $Matches[1]
    } else {
        $errors += "Missing or invalid file_id"
        return @{ Valid = $false; Errors = $errors }
    }

    if ($content -match 'version:\s+([\d.]+)') {
        $version = $Matches[1]
    } else {
        $errors += "Missing version field"
    }

    # Check file_id version matches version field
    if ($fileId -match 'v([\d.]+)' -and $version) {
        $idVersion = $Matches[1]
        if ($idVersion -ne $version) {
            $errors += "Version mismatch: file_id has v$idVersion but version field has $version"
        }
    }

    # Check filename matches
    $fileName = Split-Path $FilePath -Leaf
    if ($content -match 'name:\s+(.+)') {
        $headerName = $Matches[1].Trim()
        if ($fileName -ne $headerName) {
            $errors += "Filename mismatch: file is '$fileName' but header says '$headerName'"
        }
    }

    # Check required fields
    $requiredFields = @('file_id', 'name', 'description', 'category', 'created', 'modified', 'version')
    foreach ($field in $requiredFields) {
        if ($content -notmatch "$field\s*:") {
            $errors += "Missing required field: $field"
        }
    }

    return @{
        Valid = ($errors.Count -eq 0)
        FileId = $fileId
        Errors = $errors
    }
}

# Scan all Python and Markdown files
$files = Get-ChildItem -Path $Path -Recurse -Include *.py,*.md |
    Where-Object { $_.FullName -notmatch '(\.venv|__pycache__|node_modules)' }

$totalFiles = 0
$validFiles = 0
$invalidFiles = @()

foreach ($file in $files) {
    $totalFiles++
    $result = Test-FileHeader -FilePath $file.FullName

    if ($result.Valid) {
        $validFiles++
        Write-Host "✓" -ForegroundColor Green -NoNewline
        Write-Host " $($result.FileId) - $($file.Name)"
    } else {
        Write-Host "✗" -ForegroundColor Red -NoNewline
        Write-Host " $($file.Name)"
        foreach ($error in $result.Errors) {
            Write-Host "  └─ $error" -ForegroundColor Yellow
        }
        $invalidFiles += $file
    }
}

Write-Host "`nValidation Summary:" -ForegroundColor Cyan
Write-Host "Total Files: $totalFiles"
Write-Host "Valid: $validFiles" -ForegroundColor Green
Write-Host "Invalid: $($invalidFiles.Count)" -ForegroundColor Red

if ($invalidFiles.Count -gt 0 -and $Fix) {
    Write-Host "`nAttempting auto-fix..." -ForegroundColor Yellow
    # Auto-fix logic here (update modified dates, sync versions, etc.)
}

Example 3: SQL Queries for Analytics

-- analytics.sql
-- Advanced queries for catalog insights

-- 1. File activity timeline
SELECT
    date(modified) as day,
    category,
    COUNT(*) as files_modified
FROM file_catalog
WHERE modified >= date('now', '-30 days')
GROUP BY day, category
ORDER BY day DESC, files_modified DESC;

-- 2. Agent productivity
SELECT
    ar.name as agent,
    COUNT(DISTINCT fc.file_id) as total_files,
    COUNT(DISTINCT CASE WHEN fc.modified >= date('now', '-7 days') THEN fc.file_id END) as recent_edits,
    GROUP_CONCAT(DISTINCT fc.category) as categories_worked
FROM file_catalog fc
JOIN agent_registry ar ON fc.agent_id = ar.id
GROUP BY ar.name
ORDER BY total_files DESC;

-- 3. Tag co-occurrence matrix (find related tags)
SELECT
    t1.tag as tag1,
    t2.tag as tag2,
    COUNT(*) as co_occurrences
FROM file_tags t1
JOIN file_tags t2 ON t1.file_id = t2.file_id AND t1.tag < t2.tag
GROUP BY t1.tag, t2.tag
HAVING co_occurrences > 1
ORDER BY co_occurrences DESC
LIMIT 20;

-- 4. Version distribution
SELECT
    CAST(substr(version, 1, 1) AS INTEGER) as major_version,
    COUNT(*) as file_count
FROM file_catalog
GROUP BY major_version
ORDER BY major_version;

-- 5. Stale files (not modified in 90 days)
SELECT
    file_id,
    name,
    category,
    modified,
    julianday('now') - julianday(modified) as days_since_modified
FROM file_catalog
WHERE days_since_modified > 90
ORDER BY days_since_modified DESC;

-- 6. File dependency depth (how many layers of dependencies)
WITH RECURSIVE dep_tree AS (
    -- Base case: top-level files (no one depends on them)
    SELECT file_id, 0 as depth
    FROM file_catalog
    WHERE file_id NOT IN (SELECT depends_on_file_id FROM file_dependencies)

    UNION ALL

    -- Recursive case: files that depend on previous level
    SELECT fd.file_id, dt.depth + 1
    FROM file_dependencies fd
    JOIN dep_tree dt ON fd.depends_on_file_id = dt.file_id
)
SELECT
    fc.file_id,
    fc.name,
    MAX(dt.depth) as max_depth
FROM dep_tree dt
JOIN file_catalog fc ON dt.file_id = fc.file_id
GROUP BY fc.file_id, fc.name
ORDER BY max_depth DESC;

-- 7. Category growth over time
SELECT
    substr(created, 1, 7) as month,
    category,
    COUNT(*) as new_files
FROM file_catalog
GROUP BY month, category
ORDER BY month DESC, new_files DESC;

Use Cases & Narratives

Use Case 1: Onboarding a New Agent

Scenario: A new AI agent (AGENT-GPT-001) takes over the Ghost_Shell project and needs to understand the codebase.

Workflow:

sequenceDiagram
    participant Agent as New Agent (GPT-001)
    participant CLI as ghost-catalog CLI
    participant TUI as Bubble Tea Browser
    participant FS as File System

    Agent->>CLI: ghost-catalog stats
    CLI-->>Agent: Show project overview (25 files, 6 scripts, 4 docs)

    Agent->>TUI: Launch TUI browser
    TUI->>FS: Load catalog database
    FS-->>TUI: Return file list
    TUI-->>Agent: Display categorized file tree

    Agent->>TUI: Filter by category: "documentation"
    TUI-->>Agent: Show 4 docs (README, CODEBASE_OVERVIEW, etc.)

    Agent->>TUI: Select "CODEBASE_OVERVIEW.md"
    TUI-->>Agent: Show file details:
    Note over TUI: file_id: SOM-DOC-0003-v1.0.0
    Note over TUI: description: Complete codebase overview
    Note over TUI: tags: [handoff, architecture, overview]

    Agent->>TUI: Press 'o' to open in editor
    TUI->>FS: Open file in $EDITOR

    Agent->>CLI: ghost-catalog search "proxy"
    CLI-->>Agent: Found 3 files with "proxy" in description

    Agent->>CLI: ghost-catalog info SOM-SCR-0012-v1.1.0
    CLI-->>Agent: Show full metadata for proxy/core.py

    Agent->>Agent: Now understands project structure

Outcome: Agent quickly orients itself using catalog metadata instead of reading every file.

Use Case 2: Dependency Analysis

Scenario: Agent needs to understand which files depend on db_handler.py before refactoring it.

Workflow:

# Step 1: Find db_handler file ID
$ ghost-catalog search "db_handler"
SOM-SCR-XXXX-v1.0.0  ghost_shell/data/db_handler.py  Unified database handler

# Step 2: Query dependencies (reverse lookup)
$ sqlite3 data/catalog.db
sqlite> SELECT fc.file_id, fc.name
        FROM file_dependencies fd
        JOIN file_catalog fc ON fd.file_id = fc.file_id
        WHERE fd.depends_on_file_id = 'SOM-SCR-XXXX-v1.0.0';

# Results:
# SOM-SCR-0012-v1.1.0  core.py
# SOM-SCR-0013-v1.0.0  collector.py
# SOM-SCR-0014-v1.0.0  cli.py

# Step 3: View dependency graph in TUI
$ ghost-catalog-tui
# Press 'd' on db_handler.py → shows visual dependency tree

Outcome: Agent knows exactly which files will be affected by changes, reducing refactoring risk.

Use Case 3: Finding Stale Documentation

Scenario: Developer wants to find outdated documentation files that haven't been updated in 6 months.

Workflow:

-- Query catalog database
SELECT
    file_id,
    name,
    description,
    modified,
    julianday('now') - julianday(modified) as days_old
FROM file_catalog
WHERE category = 'documentation'
  AND days_old > 180
ORDER BY days_old DESC;

-- Results:
-- SOM-DOC-0001-v1.0.0  OLD_SETUP.md  Installation guide  2024-03-15  245 days

# Using CLI
$ ghost-catalog list --category documentation --sort modified

# Output shows oldest docs at bottom
# Developer can then update or archive them

Outcome: Easy identification of documentation needing updates.

Use Case 4: Tag-Based Code Discovery

Scenario: New developer wants to find all files related to OpenTelemetry instrumentation.

Workflow:

# Using CLI
$ ghost-catalog search --tag opentelemetry

# Results:
# SOM-SCR-0010-v1.0.0  telemetry.py        OpenTelemetry setup
# SOM-SCR-0012-v1.1.0  core.py             Main proxy addon
# SOM-SCR-0013-v1.0.0  collector.py        Intelligence collector
# SOM-SCR-0014-v1.0.0  cli.py              Management CLI

# Open all in editor
$ ghost-catalog search --tag opentelemetry --format json | jq -r '.[].path' | xargs code

Outcome: Developer instantly finds all relevant files without grep-ing or manual exploration.

Use Case 5: Version Audit

Scenario: QA team needs to verify all files are at v1.0.0 or higher before release.

Workflow:

# Validate version compliance
$ ghost-catalog validate --strict

# Output:
# ✗ SOM-SCR-0008-v0.9.5  old_module.py
#   └─ Version below 1.0.0 (v0.9.5)
# ✗ SOM-DOC-0002-v0.5.0  DRAFT_SPEC.md
#   └─ Version below 1.0.0 (v0.5.0)

# Fix by updating files and bumping versions
$ ghost-catalog generate --file ghost_shell/old_module.py --category script --version 1.0.0

Outcome: Automated version compliance checking for release readiness.

Integration Guide

Integrating with Existing Projects

Step 1: Audit Current Files

# List all Python/Markdown files without headers
find . -name "*.py" -o -name "*.md" | while read file; do
    if ! grep -q "file_id:" "$file"; then
        echo "$file"
    fi
done > files_without_headers.txt

Step 2: Generate Headers

# bulk_add_headers.py
import sys
from pathlib import Path

def generate_header(filepath, category, description):
    """Generate catalog header for a file."""
    # Determine next sequence number
    # ... (use logic from "UUID Generation Workflow")

    # Read existing content
    with open(filepath, 'r') as f:
        existing_content = f.read()

    # Generate header
    header = f"""# ==============================================================================
# file_id: {file_id}
# name: {filepath.name}
# description: {description}
# project_id: {project_id}
# category: {category}
# tags: []
# created: {today}
# modified: {today}
# version: 1.0.0
# agent_id: {agent_id}
# execution: python {filepath}
# ==============================================================================

"""

    # Write with header
    with open(filepath, 'w') as f:
        f.write(header + existing_content)

# Process all files
for filepath in Path('.').rglob('*.py'):
    if not has_header(filepath):
        generate_header(filepath, 'script', 'TODO: Add description')

Step 3: Build Catalog Database

# Create and populate catalog
ghost-catalog sync --path .

Step 4: Validate

ghost-catalog validate --fix

Integration with Git Workflows

Pre-Commit Hook:

#!/bin/bash
# .git/hooks/pre-commit
# Validate catalog headers before commit

echo "Validating catalog headers..."

# Run validation
ghost-catalog validate --strict

if [ $? -ne 0 ]; then
    echo "❌ Catalog validation failed. Fix errors before committing."
    exit 1
fi

echo "✓ Catalog validation passed"
exit 0

Post-Merge Hook:

#!/bin/bash
# .git/hooks/post-merge
# Re-sync catalog after merges

echo "Re-syncing catalog database..."
ghost-catalog sync
echo "✓ Catalog synced"

Integration with CI/CD

GitHub Actions (.github/workflows/catalog-check.yml):

name: Catalog Validation

on:
  pull_request:
    paths:
      - '**/*.py'
      - '**/*.md'
      - '**/*.yaml'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install ghost-catalog
        run: |
          # Install CLI tool
          go install github.com/somacosf/ghost-catalog@latest

      - name: Validate catalog headers
        run: |
          ghost-catalog validate --strict

      - name: Check for missing headers
        run: |
          ghost-catalog sync --dry-run
          if [ -n "$(ghost-catalog sync --dry-run | grep 'Missing header')" ]; then
            echo "❌ Some files are missing catalog headers"
            exit 1
          fi

Future Enhancements

1. Automated Dependency Detection

Goal: Automatically detect imports and build dependency graph.

Implementation:

Parse Python import statements
Parse Markdown file links
Build file_dependencies table

Example:

# auto_deps.py
import ast

def extract_imports(filepath):
    with open(filepath) as f:
        tree = ast.parse(f.read())

    imports = []
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                imports.append(alias.name)
        elif isinstance(node, ast.ImportFrom):
            imports.append(node.module)

    return imports

# Map module names to file_ids
# Insert into file_dependencies table

2. AI-Powered Tagging

Goal: Use LLM to suggest tags based on file content.

Workflow:

sequenceDiagram
    participant CLI as ghost-catalog
    participant LLM as Claude API
    participant DB as Catalog DB

    CLI->>DB: Get files without tags
    DB-->>CLI: Return file list

    loop For each file
        CLI->>LLM: Analyze file content, suggest tags
        LLM-->>CLI: Return suggested tags: [cli, admin, sqlite]
        CLI->>User: Show suggestions, confirm?
        User-->>CLI: Approve tags
        CLI->>DB: Update file_tags table
    end

3. Visual Dependency Graph

Goal: Generate visual diagrams of file relationships.

Tools: Graphviz, Mermaid, or D3.js

Example Output:

graph LR
    A[cli.py] --> B[db_handler.py]
    C[core.py] --> B
    D[collector.py] --> B
    E[launcher.py] --> A
    E --> C

4. Change Impact Analysis

Goal: Predict which files will be affected by a change.

Query:

-- Find all files that transitively depend on db_handler.py
WITH RECURSIVE impact AS (
    SELECT file_id, 1 as depth
    FROM file_catalog
    WHERE file_id = 'SOM-SCR-XXXX-v1.0.0'

    UNION ALL

    SELECT fd.file_id, impact.depth + 1
    FROM file_dependencies fd
    JOIN impact ON fd.depends_on_file_id = impact.file_id
    WHERE impact.depth < 10
)
SELECT DISTINCT fc.file_id, fc.name
FROM impact
JOIN file_catalog fc ON impact.file_id = fc.file_id;

5. Workspace-Level Registry

Goal: Single catalog database for all Somacosf projects.

Schema Extension:

CREATE TABLE projects (
    project_id TEXT PRIMARY KEY,
    name TEXT,
    path TEXT,
    description TEXT
);

ALTER TABLE file_catalog ADD COLUMN workspace_id TEXT;

Benefits:

Search across all projects
Track agent activity workspace-wide
Identify duplicate/similar files

Conclusion

The Ghost_Shell File ID System provides a semantic, self-documenting catalog framework that enables:

Rapid Onboarding: New agents can understand codebases via metadata
Efficient Search: Find files by category, tags, or relationships
Version Tracking: Built-in semver for all files
Agent Coordination: Track which agent created/modified each file
Future-Ready: Designed for database integration and advanced tooling

Next Steps:

Implement ghost-catalog CLI tool (Go or Python)
Build Bubble Tea TUI for interactive browsing
Create catalog database and sync tool
Integrate with development workflows (Git hooks, CI/CD)

Key Takeaway: Unlike UUIDs (which prioritize uniqueness), SOM File IDs prioritize discoverability and semantic richness, making them ideal for AI-agent-driven development workspaces.

Document Version: 1.0.0 Created: 2025-01-24 Agent: Claude (Sonnet 4.5) Project: Ghost_Shell / Somacosf

SoMaCoSF/GHOST_SHELL_FILE_ID_SYSTEM_DEEP_DIVE.md

Ghost_Shell File ID System: Complete Technical Deep Dive

Table of Contents

Executive Summary

What is the Ghost_Shell File ID System?

Why Not Traditional UUIDs?

System Architecture Overview

High-Level Architecture

Component Layers

File ID Schema Specification

Format Structure

Category Codes

Sequence Number Allocation

Version Semantics

Header Format Standards

Python File Header (Canonical Form)

Markdown File Header (Canonical Form)

YAML/Config File Header (Canonical Form)

PowerShell Script Header (Canonical Form)

UUID Generation Workflow

Current Implementation (Manual)

Manual Allocation Steps

file_id: {file_id}

name: {filename}

description: {description}

project_id: GHOST-SHELL

category: script

tags: {tags}

created: {today}

modified: {today}

version: 1.0.0

agent_id: AGENT-CLAUDE-002

execution: {execution_command}

==============================================================================

Proposed Automated Workflow (Future)

Data Schema Deep Dive

Complete Metadata Field Specification

Field Validation Rules

Tag Taxonomy

Registry System Architecture

Current State: Distributed File Headers

Proposed State: Centralized Catalog Database

Catalog Synchronization

Access & Browse Methods

Method 1: File Header Inspection (Current)

Method 2: Repomix Output Analysis

Method 3: CLI Tool (Proposed)

Method 4: Database Queries (Future)

Bubble Tea TUI Design

Overview

Architecture

UI Mockup

Component Breakdown

Keybindings

Implementation (Go + Bubble Tea)

Database Schema for File Tracking

CLI Interface Specifications

Command Structure

Command Details

Implementation Examples

Example 1: Python Script to Extract File IDs

Example 2: PowerShell Script to Validate Headers

Example 3: SQL Queries for Analytics

Use Cases & Narratives

Use Case 1: Onboarding a New Agent

Use Case 2: Dependency Analysis

Use Case 3: Finding Stale Documentation

Use Case 4: Tag-Based Code Discovery

Use Case 5: Version Audit

Integration Guide

Integrating with Existing Projects

Integration with Git Workflows

Integration with CI/CD

Future Enhancements

1. Automated Dependency Detection

2. AI-Powered Tagging

3. Visual Dependency Graph

4. Change Impact Analysis

5. Workspace-Level Registry

Conclusion