Skip to content

Instantly share code, notes, and snippets.

@SoMaCoSF
Created November 25, 2025 01:22
Show Gist options
  • Select an option

  • Save SoMaCoSF/38d0d859192546ca4add36e4f7351c7d to your computer and use it in GitHub Desktop.

Select an option

Save SoMaCoSF/38d0d859192546ca4add36e4f7351c7d to your computer and use it in GitHub Desktop.
Ghost Catalog: Technical Deep Dive - Complete specification of semantic file ID system for AI-agent development

Ghost_Shell File ID System: Complete Technical Deep Dive

Author: Claude (Sonnet 4.5) Date: 2025-01-24 Project: Ghost_Shell / Somacosf Workspace Purpose: Comprehensive documentation of the semantic file catalog system used across the Somacosf ecosystem


Table of Contents

  1. Executive Summary
  2. System Architecture Overview
  3. File ID Schema Specification
  4. Header Format Standards
  5. UUID Generation Workflow
  6. Data Schema Deep Dive
  7. Registry System Architecture
  8. Access & Browse Methods
  9. Bubble Tea TUI Design
  10. Database Schema for File Tracking
  11. CLI Interface Specifications
  12. Implementation Examples
  13. Use Cases & Narratives
  14. Integration Guide
  15. Future Enhancements

Executive Summary

What is the Ghost_Shell File ID System?

TL;DR: Ghost_Shell does NOT use traditional UUIDs (UUID4, UUID5). Instead, it implements a semantic file catalog system called SOM (Somacosf) that embeds rich metadata directly into file headers.

Key Characteristics:

  • Semantic IDs: Human-readable identifiers like SOM-SCR-0014-v1.0.0
  • Embedded Metadata: Every file contains a structured header with 12+ metadata fields
  • Workspace-Level Standard: Defined in CLAUDE.md and enforced across all Somacosf projects
  • No Separate Registry Database (currently): File IDs are distributed in file headers
  • Future Integration: Designed to work with tosijs-schema for runtime validation

Why Not Traditional UUIDs?

Traditional UUIDs SOM File IDs
Cryptographic randomness Semantic structure
No human meaning Category-based, sequential
Requires external registry Self-documenting in file
Version-agnostic Built-in semver tracking
550e8400-e29b-41d4-a716-446655440000 SOM-SCR-0014-v1.0.0

Decision Rationale:

  • UUIDs are excellent for distributed systems requiring uniqueness guarantees
  • SOM IDs prioritize discoverability, traceability, and AI agent coordination
  • The workspace operates with a single coordinating agent, making sequential IDs viable
  • Semantic structure enables pattern-based queries and category filtering

System Architecture Overview

High-Level Architecture

graph TB
    subgraph "File System Layer"
        A[Python Files] -->|Header| B[SOM File ID]
        C[Markdown Files] -->|Header| B
        D[Config Files] -->|Header| B
        E[Scripts] -->|Header| B
    end

    subgraph "Metadata Schema"
        B --> F[file_id: SOM-XXX-NNNN-vX.X.X]
        B --> G[Agent ID: AGENT-XXX-NNN]
        B --> H[Tags: Array]
        B --> I[Timestamps]
        B --> J[Execution Info]
    end

    subgraph "Registry System (Planned)"
        F --> K[SQLite Catalog DB]
        G --> L[Agent Registry]
        H --> M[Tag Index]
        K --> N[File Lookup API]
        L --> N
        M --> N
    end

    subgraph "Access Layer"
        N --> O[CLI Commands]
        N --> P[Bubble Tea TUI]
        N --> Q[tosijs Schema Validation]
    end

    subgraph "Integration Layer"
        O --> R[Search by Category]
        P --> S[Browse by Tags]
        Q --> T[Validate File Metadata]
    end
Loading

Component Layers

  1. File System Layer: Physical files with embedded headers
  2. Metadata Schema: Structured data within headers (current implementation)
  3. Registry System: Planned centralized database for fast queries
  4. Access Layer: Tools and interfaces to interact with file catalog
  5. Integration Layer: Cross-project features and validations

File ID Schema Specification

Format Structure

SOM-<CATEGORY>-<SEQUENCE>-v<VERSION>

Where:
- SOM        = Somacosf namespace (3 chars, fixed)
- CATEGORY   = File type code (3 chars, uppercase)
- SEQUENCE   = Unique number (4 digits, zero-padded)
- VERSION    = Semantic version (semver: MAJOR.MINOR.PATCH)

Category Codes

Code Category Description Example File Types
CMD Slash commands Claude Code slash commands .md files in .claude/commands/
SCR Scripts Executable Python/PowerShell .py, .ps1
DOC Documentation Project documentation .md (README, guides)
CFG Configuration Config files .yaml, .json, .toml
REG Registry files Registry-related files Registry schemas, catalogs
TST Tests Test suites test_*.py, *_test.py
TMP Templates Project templates Boilerplate files
DTA Data/schemas Database schemas, data .sql, .json schemas
LOG Logs/diaries Development logs development_diary.md

Sequence Number Allocation

Strategy: Sequential allocation within category scope

Rules:

  1. Sequences start at 0001 for each category
  2. Numbers are never reused (even after file deletion)
  3. Allocation is manual (no auto-increment system yet)
  4. Gaps in sequences are acceptable (e.g., 0001, 0002, 0005)

Example Progression:

SOM-SCR-0001-v1.0.0  # First script
SOM-SCR-0002-v1.0.0  # Second script
SOM-SCR-0003-v1.0.0  # Third script (later deleted)
SOM-SCR-0004-v1.0.0  # Fourth script
# Next allocation: SOM-SCR-0005 (0003 is NOT reused)

Version Semantics

Following Semantic Versioning 2.0.0 (semver.org):

MAJOR.MINOR.PATCH

MAJOR: Incompatible API changes, breaking refactors
MINOR: New features, backward-compatible additions
PATCH: Bug fixes, documentation updates

Version Bump Examples:

  • File header metadata added → PATCH bump (v1.0.0v1.0.1)
  • New function added to module → MINOR bump (v1.0.1v1.1.0)
  • Function signature changed → MAJOR bump (v1.1.0v2.0.0)

Version in File ID:

  • The version in the file_id line should match the version field
  • Both should be updated simultaneously when file is modified

Header Format Standards

Python File Header (Canonical Form)

# ==============================================================================
# file_id: SOM-SCR-0014-v1.0.0
# name: cli.py
# description: Ghost_Shell unified management CLI
# project_id: GHOST-SHELL
# category: script
# tags: [cli, management, admin, opentelemetry]
# created: 2025-11-23
# modified: 2025-11-23
# version: 1.0.0
# agent_id: AGENT-CLAUDE-002
# execution: python -m ghost_shell.cli [command] [args]
# ==============================================================================

Location: Lines 1-13 of every Python file Format: Hash comments with key: value pairs Line Count: Exactly 13 lines (including delimiters)

Markdown File Header (Canonical Form)

<!--
===============================================================================
file_id: SOM-DOC-0010-v1.0.0
name: TOSIJS_INTEGRATION.md
description: tosijs-schema integration plan for Ghost_Shell
project_id: GHOST-SHELL
category: documentation
tags: [tosijs, schema, validation, integration, planning]
created: 2025-11-23
modified: 2025-11-23
version: 1.0.0
agent:
  id: AGENT-CLAUDE-002
  name: claude_takeover
  model: claude-sonnet-4-5-20250929
execution:
  type: documentation
  invocation: Read for project understanding
===============================================================================
-->

Location: Lines 1-20 of every Markdown file Format: HTML comment block with YAML-like structure Line Count: 19-21 lines (variable based on nested fields)

Note: Markdown headers use nested YAML syntax for agent and execution fields.

YAML/Config File Header (Canonical Form)

# ==============================================================================
# file_id: SOM-CFG-0002-v1.0.0
# name: config.yaml
# description: Ghost_Shell unified configuration
# project_id: GHOST-SHELL
# category: configuration
# tags: [config, yaml, settings]
# created: 2025-11-23
# modified: 2025-11-23
# version: 1.0.0
# ==============================================================================

Location: Lines 1-11 Format: YAML comments (simpler than Markdown, no nested fields) Line Count: 11 lines

PowerShell Script Header (Canonical Form)

# ==============================================================================
# file_id: SOM-SCR-NNNN-vX.X.X
# name: script.ps1
# description: What this script does
# project_id: PROJECT-TYPE
# category: script
# tags: [tag1, tag2]
# created: YYYY-MM-DD
# modified: YYYY-MM-DD
# version: X.X.X
# agent_id: AGENT-XXX-NNN
# execution: .\script.ps1 [-Param value]
# ==============================================================================

Same structure as Python, but uses PowerShell comment syntax.


UUID Generation Workflow

Current Implementation (Manual)

Workflow Diagram:

sequenceDiagram
    participant Agent as Claude Agent
    participant FS as File System
    participant Dev as Development Diary

    Agent->>Agent: 1. Determine file purpose
    Agent->>Agent: 2. Select category code (SCR, DOC, CFG, etc.)
    Agent->>FS: 3. Search existing files for highest sequence
    Note over FS: grep "file_id: SOM-SCR" *.py | sort
    FS-->>Agent: 4. Return highest sequence (e.g., 0013)
    Agent->>Agent: 5. Increment sequence (0013 → 0014)
    Agent->>Agent: 6. Set version to v1.0.0 (new file)
    Agent->>Agent: 7. Assign agent_id (AGENT-CLAUDE-002)
    Agent->>Agent: 8. Generate file_id: SOM-SCR-0014-v1.0.0
    Agent->>FS: 9. Create file with header
    Agent->>Dev: 10. Log file creation in diary
Loading

Manual Allocation Steps

Step-by-Step Process:

  1. Determine Category

    # Is this a script? → SCR
    # Is this documentation? → DOC
    # Is this a test? → TST
  2. Find Highest Sequence in Category

    # For scripts:
    grep -r "file_id: SOM-SCR" . | sed 's/.*SOM-SCR-\([0-9]*\).*/\1/' | sort -n | tail -1
    # Output: 0013
  3. Increment Sequence

    last_sequence = 13
    new_sequence = last_sequence + 1  # 14
    formatted = f"{new_sequence:04d}"  # "0014"
  4. Construct File ID

    file_id = f"SOM-SCR-{formatted}-v1.0.0"
    # Result: SOM-SCR-0014-v1.0.0
  5. Generate Complete Header

    header = f"""# ==============================================================================

file_id: {file_id}

name: {filename}

description: {description}

project_id: GHOST-SHELL

category: script

tags: {tags}

created: {today}

modified: {today}

version: 1.0.0

agent_id: AGENT-CLAUDE-002

execution: {execution_command}

==============================================================================

"""


6. **Write File**
```python
with open(filepath, 'w') as f:
    f.write(header)
    f.write(code_content)

Proposed Automated Workflow (Future)

graph LR
    A[Create New File] --> B{File Type?}
    B -->|.py| C[Scan Python Files]
    B -->|.md| D[Scan Markdown Files]
    B -->|.yaml| E[Scan Config Files]

    C --> F[Get Highest SCR Sequence]
    D --> G[Get Highest DOC Sequence]
    E --> H[Get Highest CFG Sequence]

    F --> I[Increment & Format]
    G --> I
    H --> I

    I --> J[Query Agent Registry]
    J --> K[Get Current Agent ID]

    K --> L[Generate File ID]
    L --> M[Create Header Template]
    M --> N[Insert into File]
    N --> O[Register in Catalog DB]
    O --> P[Log to Diary]
Loading

Automation Tool (Proposed):

# Command-line tool for ID generation
ghost-catalog generate --file new_module.py --category script --description "New feature module"

# Output:
# Generated: SOM-SCR-0015-v1.0.0
# Header inserted into new_module.py
# Registered in catalog database

Data Schema Deep Dive

Complete Metadata Field Specification

Field Type Required Description Example
file_id String Yes Unique semantic identifier SOM-SCR-0014-v1.0.0
name String Yes Actual filename (must match) cli.py
description String Yes One-line purpose statement Ghost_Shell unified management CLI
project_id String Yes Parent project identifier GHOST-SHELL
category String Yes Category code (3 chars) script
tags Array Yes Semantic tags for indexing [cli, management, admin]
created Date Yes ISO 8601 date (YYYY-MM-DD) 2025-11-23
modified Date Yes Last modification date 2025-11-23
version String Yes Semantic version 1.0.0
agent_id String Yes Creating/modifying agent AGENT-CLAUDE-002
agent.name String Optional Agent name (MD only) claude_takeover
agent.model String Optional LLM model (MD only) claude-sonnet-4-5-20250929
execution String Yes How to run/use file python -m ghost_shell.cli stats

Field Validation Rules

file_id:

  • Pattern: ^SOM-[A-Z]{3}-\d{4}-v\d+\.\d+\.\d+$
  • Category: Must match one of the defined category codes
  • Sequence: Must be unique within category
  • Version: Must match version field

name:

  • Constraint: Must exactly match the filename
  • Case-sensitive: CLI.pycli.py
  • Extension: Must include file extension

description:

  • Length: 1-100 characters recommended
  • Style: Imperative mood, concise
  • Good: Unified management CLI for Ghost_Shell
  • Bad: This file contains a CLI that manages things

project_id:

  • Format: [A-Z-]+ (uppercase with hyphens)
  • Examples: GHOST-SHELL, DMBT, BROWSER-MIXER
  • Scope: Project-level identifier (not workspace)

category:

  • Values: One of script, documentation, configuration, test, template, data, log
  • Consistency: Should align with file_id category code

tags:

  • Format: Array of lowercase strings
  • Separator: Commas in array notation [tag1, tag2, tag3]
  • Style: Use hyphens for multi-word tags: multi-word-tag
  • Purpose: Enable semantic search and classification

created / modified:

  • Format: YYYY-MM-DD (ISO 8601 date only, no time)
  • created: Never changes after initial creation
  • modified: Updated whenever file content changes

version:

  • Format: MAJOR.MINOR.PATCH (semver)
  • Constraints: Must not have leading zeros (e.g., 1.01.0 is invalid)
  • Sync: Must match version in file_id

agent_id:

  • Format: AGENT-[A-Z]+-\d{3}
  • Registry: Should reference a defined agent in agent_registry.md
  • Examples: AGENT-CLAUDE-002, AGENT-PRIME-001

execution:

  • Purpose: Provides runnable command or usage instruction
  • Format: Actual shell command or description
  • Examples:
    • Scripts: python -m ghost_shell.cli stats
    • Docs: Read for project understanding
    • Configs: Loaded by launcher.py on startup

Tag Taxonomy

Recommended Tag Categories:

  1. Functional Tags: What the file does

    • cli, gui, tui, api, database, networking
  2. Technology Tags: What it uses

    • opentelemetry, sqlite, mitmproxy, textual, tosijs
  3. Purpose Tags: Why it exists

    • admin, debugging, monitoring, security, privacy
  4. Domain Tags: What area it covers

    • proxy, firewall, intelligence, analytics, telemetry
  5. Status Tags: Implementation state

    • wip, deprecated, experimental, stable

Example Tagging:

# For ghost_shell/cli.py:
tags: [cli, management, admin, opentelemetry, statistics]

# For ghost_shell/proxy/blocker.py:
tags: [proxy, blocking, privacy, security, mitmproxy]

# For development_diary.md:
tags: [log, diary, development, documentation]

Registry System Architecture

Current State: Distributed File Headers

Architecture:

graph TB
    subgraph "File System (Current)"
        A[cli.py<br/>SOM-SCR-0014]
        B[launcher.py<br/>SOM-SCR-0015]
        C[telemetry.py<br/>SOM-SCR-0010]
        D[config.yaml<br/>SOM-CFG-0002]
    end

    subgraph "Query Methods"
        E[grep + awk + sort]
        F[repomix output]
        G[Manual file inspection]
    end

    A --> E
    B --> E
    C --> E
    D --> E

    A --> F
    B --> F
    C --> F
    D --> F

    A --> G
    B --> G
    C --> G
    D --> G

    E --> H[File ID List]
    F --> H
    G --> H
Loading

Characteristics:

  • No database dependency: Works immediately
  • Git-trackable: Headers are version-controlled
  • Self-documenting: Metadata travels with file
  • Slow queries: Must scan all files for searches
  • No indexing: Can't efficiently search by tags
  • No relationships: Can't link files (dependencies, etc.)

Proposed State: Centralized Catalog Database

Architecture:

graph TB
    subgraph "File System"
        A[Files with Headers]
    end

    subgraph "Catalog Database (SQLite)"
        B[(file_catalog table)]
        C[(agent_registry table)]
        D[(file_tags table)]
        E[(file_dependencies table)]
    end

    subgraph "Sync Layer"
        F[Catalog Scanner]
        G[File Watcher]
    end

    subgraph "Access APIs"
        H[CLI: ghost-catalog]
        I[TUI: Bubble Tea Browser]
        J[Python API: CatalogQuery]
    end

    A --> F
    F --> B
    F --> C
    F --> D

    A --> G
    G --> B

    B --> H
    C --> H
    D --> H

    B --> I
    C --> I
    D --> I

    B --> J
    C --> J
    D --> J
Loading

Database Schema (Proposed):

-- Core catalog table
CREATE TABLE file_catalog (
    file_id TEXT PRIMARY KEY,              -- SOM-XXX-NNNN-vX.X.X
    name TEXT NOT NULL,                    -- Filename
    path TEXT NOT NULL,                    -- Relative or absolute path
    description TEXT,
    project_id TEXT,
    category TEXT,                         -- SOM category code
    created DATE,
    modified DATE,
    version TEXT,                          -- Semantic version
    agent_id TEXT,
    execution TEXT,
    checksum TEXT,                         -- SHA256 hash for integrity
    last_scanned TIMESTAMP,                -- When catalog was last updated
    FOREIGN KEY (agent_id) REFERENCES agent_registry(id)
);

-- Tag index (many-to-many)
CREATE TABLE file_tags (
    file_id TEXT,
    tag TEXT,
    PRIMARY KEY (file_id, tag),
    FOREIGN KEY (file_id) REFERENCES file_catalog(file_id)
);

-- Agent registry
CREATE TABLE agent_registry (
    id TEXT PRIMARY KEY,                   -- AGENT-XXX-NNN
    name TEXT,
    model TEXT,
    first_seen TIMESTAMP,
    last_active TIMESTAMP
);

-- File dependencies (optional, for advanced tracking)
CREATE TABLE file_dependencies (
    file_id TEXT,
    depends_on_file_id TEXT,
    dependency_type TEXT,                  -- import, config, data, etc.
    PRIMARY KEY (file_id, depends_on_file_id),
    FOREIGN KEY (file_id) REFERENCES file_catalog(file_id),
    FOREIGN KEY (depends_on_file_id) REFERENCES file_catalog(file_id)
);

-- Indexes for fast queries
CREATE INDEX idx_category ON file_catalog(category);
CREATE INDEX idx_project ON file_catalog(project_id);
CREATE INDEX idx_agent ON file_catalog(agent_id);
CREATE INDEX idx_modified ON file_catalog(modified);
CREATE INDEX idx_tags ON file_tags(tag);

Catalog Synchronization

Process:

sequenceDiagram
    participant FS as File System
    participant Scanner as Catalog Scanner
    participant DB as Catalog Database
    participant Log as Sync Log

    Scanner->>FS: 1. Glob all files (**/*.py, **/*.md, etc.)
    FS-->>Scanner: 2. Return file list

    loop For each file
        Scanner->>FS: 3. Read file header (first 20 lines)
        FS-->>Scanner: 4. Return header content
        Scanner->>Scanner: 5. Parse header metadata
        Scanner->>Scanner: 6. Calculate SHA256 checksum
        Scanner->>DB: 7. SELECT checksum WHERE file_id = ?

        alt File exists and unchanged
            DB-->>Scanner: Checksum matches
            Scanner->>Scanner: Skip (no update needed)
        else File new or modified
            DB-->>Scanner: Checksum differs or NULL
            Scanner->>DB: 8. UPSERT file_catalog
            Scanner->>DB: 9. DELETE old tags, INSERT new tags
            Scanner->>Log: 10. Log update
        end
    end

    Scanner->>DB: 11. Mark stale entries (files deleted)
    Scanner->>Log: 12. Generate sync report
Loading

Command:

# Scan and update catalog
ghost-catalog sync

# Output:
# Scanned: 50 files
# Updated: 3 files
# New: 1 file
# Unchanged: 46 files
# Stale: 0 files
# Sync completed in 1.2s

Access & Browse Methods

Method 1: File Header Inspection (Current)

Bash Commands:

# List all file IDs
grep -r "file_id:" . --include="*.py" --include="*.md" | awk '{print $2}'

# List by category
grep -r "file_id: SOM-SCR" . --include="*.py" | awk '{print $2}'

# Find file by ID
grep -r "file_id: SOM-SCR-0014" .

# List all tags
grep -r "tags:" . --include="*.py" --include="*.md" | sed 's/.*tags: //' | tr '[],' '\n' | sort -u

# Find files with specific tag
grep -r "tags:.*opentelemetry" . --include="*.py"

PowerShell Commands:

# List all file IDs
Select-String -Path .\**\*.py,.\**\*.md -Pattern "file_id:" | ForEach-Object { ($_ -split ' ')[1] }

# Count files by category
Select-String -Path .\**\*.py -Pattern "file_id:" |
    ForEach-Object { ($_ -split '-')[1] } |
    Group-Object |
    Select-Object Name, Count

# Find newest modified files
Select-String -Path .\**\*.py,.\**\*.md -Pattern "modified:" |
    ForEach-Object {
        $date = ($_ -split ' ')[1]
        $file = $_.Path
        [PSCustomObject]@{Date=$date; File=$file}
    } |
    Sort-Object -Property Date -Descending |
    Select-Object -First 10

Method 2: Repomix Output Analysis

Generate Repomix:

repomix --output catalog.txt --style xml

Parse Repomix:

import re

def parse_file_ids_from_repomix(repomix_path):
    """Extract all file IDs from repomix output"""
    pattern = r'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)'

    with open(repomix_path, 'r') as f:
        content = f.read()

    matches = re.findall(pattern, content)
    return list(set(matches))  # Unique file IDs

# Usage
file_ids = parse_file_ids_from_repomix('ghost_shell_repomix_output.txt')
print(f"Found {len(file_ids)} files with catalog headers")

Method 3: CLI Tool (Proposed)

Command Reference:

# List all cataloged files
ghost-catalog list

# Filter by category
ghost-catalog list --category script

# Search by tag
ghost-catalog search --tag opentelemetry

# Get file details
ghost-catalog info SOM-SCR-0014-v1.0.0

# Validate all headers
ghost-catalog validate

# Fix inconsistencies
ghost-catalog validate --fix

# Generate new file with header
ghost-catalog generate --file new_module.py --category script --description "New module"

# Show catalog statistics
ghost-catalog stats

# Export catalog to JSON
ghost-catalog export catalog.json

Example Output:

$ ghost-catalog list --category script

╭─────────────────────────────────────────────────────────────╮
│               Ghost_Shell File Catalog (Scripts)            │
├──────────────────────────┬──────────────────────────────────┤
│ File ID                  │ Description                      │
├──────────────────────────┼──────────────────────────────────┤
│ SOM-SCR-0010-v1.0.0      │ OpenTelemetry setup              │
│ SOM-SCR-0011-v1.0.0      │ Traffic blocking                 │
│ SOM-SCR-0012-v1.1.0      │ Main proxy addon                 │
│ SOM-SCR-0013-v1.0.0      │ Intelligence collector           │
│ SOM-SCR-0014-v1.0.0      │ Management CLI                   │
│ SOM-SCR-0015-v1.1.0      │ Orchestrator                     │
├──────────────────────────┴──────────────────────────────────┤
│ Total: 6 scripts                                            │
╰─────────────────────────────────────────────────────────────╯

Method 4: Database Queries (Future)

-- Find all files modified in last 7 days
SELECT file_id, name, modified
FROM file_catalog
WHERE modified >= date('now', '-7 days')
ORDER BY modified DESC;

-- Count files by category
SELECT category, COUNT(*) as count
FROM file_catalog
GROUP BY category
ORDER BY count DESC;

-- Find files by tag
SELECT fc.file_id, fc.name, fc.description
FROM file_catalog fc
JOIN file_tags ft ON fc.file_id = ft.file_id
WHERE ft.tag = 'opentelemetry';

-- Find files by agent
SELECT file_id, name, version
FROM file_catalog
WHERE agent_id = 'AGENT-CLAUDE-002'
ORDER BY modified DESC;

-- Find outdated versions (MAJOR version < 1)
SELECT file_id, name, version
FROM file_catalog
WHERE CAST(substr(version, 1, instr(version, '.') - 1) AS INTEGER) < 1;

-- File dependency graph
WITH RECURSIVE deps AS (
    SELECT file_id, depends_on_file_id, 1 as depth
    FROM file_dependencies
    WHERE file_id = 'SOM-SCR-0014-v1.0.0'

    UNION ALL

    SELECT fd.file_id, fd.depends_on_file_id, deps.depth + 1
    FROM file_dependencies fd
    JOIN deps ON fd.file_id = deps.depends_on_file_id
    WHERE depth < 5
)
SELECT DISTINCT fc.file_id, fc.name, deps.depth
FROM deps
JOIN file_catalog fc ON deps.depends_on_file_id = fc.file_id
ORDER BY depth, fc.file_id;

Bubble Tea TUI Design

Overview

Bubble Tea (https://github.com/charmbracelet/bubbletea) is a Go framework for building terminal UIs. For Ghost_Shell, we'll design a TUI for browsing the file catalog.

Why Bubble Tea?

  • Modern, composable TUI framework
  • Rich text rendering (via Lip Gloss)
  • Mouse support, interactive tables
  • Cross-platform (Windows, Linux, macOS)

Implementation Language: Go (Bubble Tea is Go-native)

Architecture

graph TB
    subgraph "Bubble Tea TUI"
        A[Main View] --> B[Navigation Sidebar]
        A --> C[Content Pane]

        B --> D[Category Filter]
        B --> E[Tag Filter]
        B --> F[Search Box]

        C --> G[File List Table]
        C --> H[File Detail View]
        C --> I[Dependency Graph]
    end

    subgraph "Data Layer"
        J[SQLite Catalog DB]
        K[File System]
    end

    G --> J
    H --> K
    I --> J
Loading

UI Mockup

╔════════════════════════════════════════════════════════════════════════════╗
║                   Ghost_Shell File Catalog Browser v1.0                    ║
╠══════════════════╦═════════════════════════════════════════════════════════╣
║  FILTERS         ║  FILE LIST (25 files)                                   ║
║                  ║ ┌────────────────────┬────────────────────────────────┐ ║
║ Categories       ║ │ File ID            │ Description                    │ ║
║ [●] Scripts (6)  ║ ├────────────────────┼────────────────────────────────┤ ║
║ [ ] Docs (4)     ║ │ SOM-SCR-0010-v1.0.0│ OpenTelemetry setup            │ ║
║ [ ] Config (2)   ║ │ SOM-SCR-0011-v1.0.0│ Traffic blocking               │ ║
║ [ ] Tests (1)    ║ │►SOM-SCR-0012-v1.1.0│ Main proxy addon               │ ║
║                  ║ │ SOM-SCR-0013-v1.0.0│ Intelligence collector         │ ║
║ Tags             ║ │ SOM-SCR-0014-v1.0.0│ Management CLI                 │ ║
║ [x] opentelemetry║ │ SOM-SCR-0015-v1.1.0│ Orchestrator                   │ ║
║ [ ] proxy        ║ └────────────────────┴────────────────────────────────┘ ║
║ [ ] cli          ║                                                         ║
║                  ║  FILE DETAILS: SOM-SCR-0012-v1.1.0                      ║
║ Search           ║ ┌─────────────────────────────────────────────────────┐ ║
║ [proxy______]    ║ │ Name: core.py                                       │ ║
║                  ║ │ Path: ghost_shell/proxy/core.py                     │ ║
║ Sort By          ║ │ Description: Main proxy addon                       │ ║
║ ● Modified       ║ │ Project: GHOST-SHELL                                │ ║
║ ○ Created        ║ │ Category: script                                    │ ║
║ ○ File ID        ║ │ Tags: [proxy, mitmproxy, core]                      │ ║
║                  ║ │ Version: 1.1.0                                      │ ║
║ Actions          ║ │ Agent: AGENT-CLAUDE-002                             │ ║
║ [o] Open in $ED  ║ │ Created: 2025-11-23  Modified: 2025-11-23          │ ║
║ [c] Copy path    ║ │ Execution: python -s ghost_shell/proxy/core.py      │ ║
║ [d] Dependencies ║ │ Checksum: 3a5f7c... (verified)                      │ ║
║ [g] Git log      ║ └─────────────────────────────────────────────────────┘ ║
║                  ║                                                         ║
╠══════════════════╩═════════════════════════════════════════════════════════╣
║ [?] Help  [/] Search  [Tab] Switch Pane  [q] Quit  [Enter] Select          ║
╚════════════════════════════════════════════════════════════════════════════╝

Component Breakdown

1. Navigation Sidebar (Left, 25% width)

  • Category Filters: Checkboxes for SCR, DOC, CFG, TST
  • Tag Filters: Checkboxes for common tags (top 10)
  • Search Box: Live filter by file_id, name, or description
  • Sort Options: Radio buttons for sort order
  • Action Buttons: Quick actions (open, copy, etc.)

2. File List Table (Top-right, 75% width, 60% height)

  • Columns: File ID, Description (truncated if needed)
  • Selection: Arrow keys + Enter to select
  • Highlight: Selected row highlighted
  • Pagination: If >20 files, show page indicator

3. File Detail View (Bottom-right, 75% width, 40% height)

  • Metadata Display: All header fields in readable format
  • Checksum Verification: Shows if file matches catalog
  • Action Hints: Keybindings for common actions

4. Status Bar (Bottom)

  • Keybindings: Quick reference for navigation
  • Status: Current filter/search state

Keybindings

Key Action Description
/ Navigate list Move selection up/down
Enter Select file Show details in detail pane
Tab Switch pane Toggle between sidebar and file list
/ Search Focus search box
Esc Clear filter Reset all filters
o Open in editor Open file in $EDITOR
c Copy path Copy file path to clipboard
d Show dependencies Open dependency graph view
g Git log Show git history for file
v Validate Re-validate file header
r Refresh Re-scan catalog database
? Help Show full help screen
q Quit Exit application

Implementation (Go + Bubble Tea)

File Structure:

ghost-catalog-tui/
├── main.go                 # Entry point
├── model.go                # Bubble Tea model
├── update.go               # Update logic
├── view.go                 # Rendering logic
├── components/
│   ├── sidebar.go          # Sidebar component
│   ├── filelist.go         # File list table
│   ├── details.go          # Detail pane
│   └── statusbar.go        # Status bar
├── database/
│   └── catalog.go          # SQLite queries
└── styles/
    └── theme.go            # Lip Gloss styles

Sample Code (main.go):

package main

import (
    "database/sql"
    "fmt"
    "log"
    "os"

    tea "github.com/charmbracelet/bubbletea"
    _ "github.com/mattn/go-sqlite3"
)

type model struct {
    db            *sql.DB
    files         []File
    selectedIndex int
    filterCategory string
    filterTags    []string
    searchQuery   string
    focusedPane   string // "sidebar" or "filelist"
}

type File struct {
    FileID      string
    Name        string
    Description string
    Path        string
    Category    string
    Tags        []string
    Version     string
    Modified    string
}

func initialModel() model {
    db, err := sql.Open("sqlite3", "./data/catalog.db")
    if err != nil {
        log.Fatal(err)
    }

    return model{
        db:          db,
        focusedPane: "filelist",
    }
}

func (m model) Init() tea.Cmd {
    return loadFiles(m.db)
}

func (m model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
    switch msg := msg.(type) {
    case tea.KeyMsg:
        switch msg.String() {
        case "q", "ctrl+c":
            return m, tea.Quit
        case "up":
            if m.selectedIndex > 0 {
                m.selectedIndex--
            }
        case "down":
            if m.selectedIndex < len(m.files)-1 {
                m.selectedIndex++
            }
        case "enter":
            // Show file details
        case "/":
            // Focus search box
        }
    case filesLoadedMsg:
        m.files = msg.files
    }
    return m, nil
}

func (m model) View() string {
    // Render UI (see view.go implementation)
    return renderUI(m)
}

func main() {
    p := tea.NewProgram(initialModel(), tea.WithAltScreen())
    if err := p.Start(); err != nil {
        fmt.Printf("Error: %v", err)
        os.Exit(1)
    }
}

Database Queries (database/catalog.go):

package database

import (
    "database/sql"
)

func QueryFilesByCategoryAndTags(db *sql.DB, category string, tags []string) ([]File, error) {
    query := `
        SELECT DISTINCT fc.file_id, fc.name, fc.description, fc.path, fc.category, fc.version, fc.modified
        FROM file_catalog fc
        LEFT JOIN file_tags ft ON fc.file_id = ft.file_id
        WHERE 1=1
    `

    args := []interface{}{}

    if category != "" {
        query += " AND fc.category = ?"
        args = append(args, category)
    }

    if len(tags) > 0 {
        query += " AND ft.tag IN ("
        for i, tag := range tags {
            if i > 0 {
                query += ","
            }
            query += "?"
            args = append(args, tag)
        }
        query += ")"
    }

    query += " ORDER BY fc.modified DESC"

    rows, err := db.Query(query, args...)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var files []File
    for rows.Next() {
        var f File
        err := rows.Scan(&f.FileID, &f.Name, &f.Description, &f.Path, &f.Category, &f.Version, &f.Modified)
        if err != nil {
            return nil, err
        }
        files = append(files, f)
    }

    return files, nil
}

Database Schema for File Tracking

(See "Registry System Architecture" section above for complete schema)

Summary Tables:

  1. file_catalog: Core metadata for all cataloged files
  2. file_tags: Tag index (many-to-many)
  3. agent_registry: Agent metadata and activity tracking
  4. file_dependencies: Relationships between files

Sample Data:

-- Insert file
INSERT INTO file_catalog VALUES (
    'SOM-SCR-0014-v1.0.0',
    'cli.py',
    'ghost_shell/cli.py',
    'Ghost_Shell unified management CLI',
    'GHOST-SHELL',
    'script',
    '2025-11-23',
    '2025-11-23',
    '1.0.0',
    'AGENT-CLAUDE-002',
    'python -m ghost_shell.cli [command] [args]',
    '3a5f7c9d...',
    '2025-11-24 10:30:00'
);

-- Insert tags
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'cli');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'management');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'admin');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'opentelemetry');

-- Insert agent
INSERT INTO agent_registry VALUES (
    'AGENT-CLAUDE-002',
    'claude_takeover',
    'claude-sonnet-4-5-20250929',
    '2025-11-23 00:00:00',
    '2025-11-24 10:30:00'
);

-- Insert dependency (cli.py depends on db_handler.py)
INSERT INTO file_dependencies VALUES (
    'SOM-SCR-0014-v1.0.0',
    'SOM-SCR-XXXX-vX.X.X',  -- db_handler.py file_id
    'import'
);

CLI Interface Specifications

Command Structure

ghost-catalog <command> [options]

Commands:
  list         List all cataloged files
  search       Search files by criteria
  info         Show detailed information about a file
  validate     Validate file headers
  generate     Generate new file with catalog header
  sync         Synchronize catalog database with file system
  stats        Show catalog statistics
  export       Export catalog to JSON/CSV
  tags         Manage tags
  agents       List agents in registry

Global Options:
  --db PATH    Path to catalog database (default: ./data/catalog.db)
  --verbose    Show detailed output
  --help       Show help

Command Details

ghost-catalog list

ghost-catalog list [options]

Options:
  --category CATEGORY   Filter by category (script, documentation, etc.)
  --tag TAG            Filter by tag (can be specified multiple times)
  --project PROJECT    Filter by project ID
  --agent AGENT        Filter by agent ID
  --sort FIELD         Sort by field (modified, created, file_id)
  --format FORMAT      Output format (table, json, csv)

Examples:
  ghost-catalog list --category script
  ghost-catalog list --tag opentelemetry --tag cli
  ghost-catalog list --project GHOST-SHELL --sort modified

ghost-catalog search

ghost-catalog search QUERY [options]

Arguments:
  QUERY                Search query (searches file_id, name, description)

Options:
  --category CATEGORY  Filter by category
  --tag TAG           Filter by tag
  --fuzzy             Enable fuzzy matching

Examples:
  ghost-catalog search "proxy"
  ghost-catalog search "telemetry" --tag opentelemetry
  ghost-catalog search "SOM-SCR-001" --fuzzy

ghost-catalog info

ghost-catalog info FILE_ID

Arguments:
  FILE_ID             File ID to show details for

Examples:
  ghost-catalog info SOM-SCR-0014-v1.0.0

ghost-catalog validate

ghost-catalog validate [options]

Options:
  --fix               Auto-fix minor issues (sync version, update modified date)
  --file FILE         Validate specific file
  --strict            Enable strict validation (fail on warnings)

Examples:
  ghost-catalog validate
  ghost-catalog validate --fix
  ghost-catalog validate --file ghost_shell/cli.py --strict

ghost-catalog generate

ghost-catalog generate --file FILE --category CATEGORY [options]

Options:
  --file FILE          File to create/update
  --category CATEGORY  Category code (script, documentation, etc.)
  --description DESC   File description
  --tags TAG1,TAG2    Comma-separated tags
  --project PROJECT   Project ID (default: current project)
  --agent AGENT       Agent ID (default: current agent)

Examples:
  ghost-catalog generate --file new_module.py --category script --description "New feature module" --tags cli,admin

ghost-catalog sync

ghost-catalog sync [options]

Options:
  --path PATH         Path to scan (default: current directory)
  --recursive         Scan recursively (default: true)
  --dry-run          Show what would be updated without making changes

Examples:
  ghost-catalog sync
  ghost-catalog sync --path ./ghost_shell --dry-run

ghost-catalog stats

ghost-catalog stats

Examples:
  ghost-catalog stats

Output:

╭─────────────────────────────────────╮
│     Ghost_Shell Catalog Stats       │
├─────────────────────────────────────┤
│ Total Files:             25         │
│ Scripts:                 6          │
│ Documentation:           4          │
│ Configuration:           2          │
│ Tests:                   1          │
│                                     │
│ Total Tags:              42         │
│ Most Used Tag:           opentelemetry (8 files) │
│                                     │
│ Total Agents:            2          │
│ Most Active Agent:       AGENT-CLAUDE-002 (20 files) │
│                                     │
│ Last Sync:               2025-11-24 10:30:00       │
│ Database Size:           245 KB     │
╰─────────────────────────────────────╯

ghost-catalog export

ghost-catalog export OUTPUT [options]

Arguments:
  OUTPUT              Output file path

Options:
  --format FORMAT     Export format (json, csv, yaml)
  --category CATEGORY Filter by category
  --tag TAG          Filter by tag

Examples:
  ghost-catalog export catalog.json
  ghost-catalog export catalog.csv --format csv --category script

Implementation Examples

Example 1: Python Script to Extract File IDs

#!/usr/bin/env python3
"""
Extract all file IDs from a codebase using header parsing.
"""

import re
from pathlib import Path
from typing import Dict, List

def extract_file_id_from_content(content: str) -> Dict[str, str]:
    """Parse file header and extract all metadata fields."""
    metadata = {}

    # Patterns for different header styles
    patterns = {
        'file_id': r'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)',
        'name': r'name:\s+(.+)',
        'description': r'description:\s+(.+)',
        'category': r'category:\s+(\w+)',
        'version': r'version:\s+([\d.]+)',
        'agent_id': r'agent_id:\s+(AGENT-[A-Z]+-\d{3})',
        'created': r'created:\s+(\d{4}-\d{2}-\d{2})',
        'modified': r'modified:\s+(\d{4}-\d{2}-\d{2})',
    }

    for key, pattern in patterns.items():
        match = re.search(pattern, content)
        if match:
            metadata[key] = match.group(1).strip()

    # Extract tags (array format)
    tags_match = re.search(r'tags:\s+\[([^\]]+)\]', content)
    if tags_match:
        tags_str = tags_match.group(1)
        metadata['tags'] = [tag.strip() for tag in tags_str.split(',')]

    return metadata

def scan_directory(root_path: Path, extensions: List[str]) -> List[Dict]:
    """Scan directory for files with catalog headers."""
    results = []

    for ext in extensions:
        for filepath in root_path.rglob(f'*{ext}'):
            # Skip virtual environments and cache directories
            if any(part in str(filepath) for part in ['.venv', '__pycache__', 'node_modules']):
                continue

            try:
                with open(filepath, 'r', encoding='utf-8') as f:
                    # Read first 30 lines (headers should be within this range)
                    header_content = ''.join(f.readlines()[:30])

                metadata = extract_file_id_from_content(header_content)

                if metadata.get('file_id'):
                    metadata['path'] = str(filepath.relative_to(root_path))
                    results.append(metadata)
            except Exception as e:
                print(f"Error reading {filepath}: {e}")

    return results

if __name__ == '__main__':
    import json

    # Scan current directory
    root = Path('.')
    extensions = ['.py', '.md', '.yaml', '.yml', '.ps1']

    files = scan_directory(root, extensions)

    print(f"Found {len(files)} cataloged files\n")

    # Group by category
    by_category = {}
    for file in files:
        category = file.get('category', 'unknown')
        if category not in by_category:
            by_category[category] = []
        by_category[category].append(file)

    # Print summary
    for category, items in sorted(by_category.items()):
        print(f"\n{category.upper()} ({len(items)} files):")
        for item in items:
            print(f"  {item['file_id']:25} {item.get('description', 'No description')[:50]}")

    # Export to JSON
    with open('catalog_export.json', 'w') as f:
        json.dump(files, f, indent=2)

    print(f"\n✓ Catalog exported to catalog_export.json")

Example 2: PowerShell Script to Validate Headers

# validate_headers.ps1
# Validate all file catalog headers for consistency

param(
    [string]$Path = ".",
    [switch]$Fix
)

function Test-FileHeader {
    param([string]$FilePath)

    $content = Get-Content $FilePath -TotalCount 20 -Raw
    $errors = @()

    # Extract metadata
    if ($content -match 'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)') {
        $fileId = $Matches[1]
    } else {
        $errors += "Missing or invalid file_id"
        return @{ Valid = $false; Errors = $errors }
    }

    if ($content -match 'version:\s+([\d.]+)') {
        $version = $Matches[1]
    } else {
        $errors += "Missing version field"
    }

    # Check file_id version matches version field
    if ($fileId -match 'v([\d.]+)' -and $version) {
        $idVersion = $Matches[1]
        if ($idVersion -ne $version) {
            $errors += "Version mismatch: file_id has v$idVersion but version field has $version"
        }
    }

    # Check filename matches
    $fileName = Split-Path $FilePath -Leaf
    if ($content -match 'name:\s+(.+)') {
        $headerName = $Matches[1].Trim()
        if ($fileName -ne $headerName) {
            $errors += "Filename mismatch: file is '$fileName' but header says '$headerName'"
        }
    }

    # Check required fields
    $requiredFields = @('file_id', 'name', 'description', 'category', 'created', 'modified', 'version')
    foreach ($field in $requiredFields) {
        if ($content -notmatch "$field\s*:") {
            $errors += "Missing required field: $field"
        }
    }

    return @{
        Valid = ($errors.Count -eq 0)
        FileId = $fileId
        Errors = $errors
    }
}

# Scan all Python and Markdown files
$files = Get-ChildItem -Path $Path -Recurse -Include *.py,*.md |
    Where-Object { $_.FullName -notmatch '(\.venv|__pycache__|node_modules)' }

$totalFiles = 0
$validFiles = 0
$invalidFiles = @()

foreach ($file in $files) {
    $totalFiles++
    $result = Test-FileHeader -FilePath $file.FullName

    if ($result.Valid) {
        $validFiles++
        Write-Host "" -ForegroundColor Green -NoNewline
        Write-Host " $($result.FileId) - $($file.Name)"
    } else {
        Write-Host "" -ForegroundColor Red -NoNewline
        Write-Host " $($file.Name)"
        foreach ($error in $result.Errors) {
            Write-Host "  └─ $error" -ForegroundColor Yellow
        }
        $invalidFiles += $file
    }
}

Write-Host "`nValidation Summary:" -ForegroundColor Cyan
Write-Host "Total Files: $totalFiles"
Write-Host "Valid: $validFiles" -ForegroundColor Green
Write-Host "Invalid: $($invalidFiles.Count)" -ForegroundColor Red

if ($invalidFiles.Count -gt 0 -and $Fix) {
    Write-Host "`nAttempting auto-fix..." -ForegroundColor Yellow
    # Auto-fix logic here (update modified dates, sync versions, etc.)
}

Example 3: SQL Queries for Analytics

-- analytics.sql
-- Advanced queries for catalog insights

-- 1. File activity timeline
SELECT
    date(modified) as day,
    category,
    COUNT(*) as files_modified
FROM file_catalog
WHERE modified >= date('now', '-30 days')
GROUP BY day, category
ORDER BY day DESC, files_modified DESC;

-- 2. Agent productivity
SELECT
    ar.name as agent,
    COUNT(DISTINCT fc.file_id) as total_files,
    COUNT(DISTINCT CASE WHEN fc.modified >= date('now', '-7 days') THEN fc.file_id END) as recent_edits,
    GROUP_CONCAT(DISTINCT fc.category) as categories_worked
FROM file_catalog fc
JOIN agent_registry ar ON fc.agent_id = ar.id
GROUP BY ar.name
ORDER BY total_files DESC;

-- 3. Tag co-occurrence matrix (find related tags)
SELECT
    t1.tag as tag1,
    t2.tag as tag2,
    COUNT(*) as co_occurrences
FROM file_tags t1
JOIN file_tags t2 ON t1.file_id = t2.file_id AND t1.tag < t2.tag
GROUP BY t1.tag, t2.tag
HAVING co_occurrences > 1
ORDER BY co_occurrences DESC
LIMIT 20;

-- 4. Version distribution
SELECT
    CAST(substr(version, 1, 1) AS INTEGER) as major_version,
    COUNT(*) as file_count
FROM file_catalog
GROUP BY major_version
ORDER BY major_version;

-- 5. Stale files (not modified in 90 days)
SELECT
    file_id,
    name,
    category,
    modified,
    julianday('now') - julianday(modified) as days_since_modified
FROM file_catalog
WHERE days_since_modified > 90
ORDER BY days_since_modified DESC;

-- 6. File dependency depth (how many layers of dependencies)
WITH RECURSIVE dep_tree AS (
    -- Base case: top-level files (no one depends on them)
    SELECT file_id, 0 as depth
    FROM file_catalog
    WHERE file_id NOT IN (SELECT depends_on_file_id FROM file_dependencies)

    UNION ALL

    -- Recursive case: files that depend on previous level
    SELECT fd.file_id, dt.depth + 1
    FROM file_dependencies fd
    JOIN dep_tree dt ON fd.depends_on_file_id = dt.file_id
)
SELECT
    fc.file_id,
    fc.name,
    MAX(dt.depth) as max_depth
FROM dep_tree dt
JOIN file_catalog fc ON dt.file_id = fc.file_id
GROUP BY fc.file_id, fc.name
ORDER BY max_depth DESC;

-- 7. Category growth over time
SELECT
    substr(created, 1, 7) as month,
    category,
    COUNT(*) as new_files
FROM file_catalog
GROUP BY month, category
ORDER BY month DESC, new_files DESC;

Use Cases & Narratives

Use Case 1: Onboarding a New Agent

Scenario: A new AI agent (AGENT-GPT-001) takes over the Ghost_Shell project and needs to understand the codebase.

Workflow:

sequenceDiagram
    participant Agent as New Agent (GPT-001)
    participant CLI as ghost-catalog CLI
    participant TUI as Bubble Tea Browser
    participant FS as File System

    Agent->>CLI: ghost-catalog stats
    CLI-->>Agent: Show project overview (25 files, 6 scripts, 4 docs)

    Agent->>TUI: Launch TUI browser
    TUI->>FS: Load catalog database
    FS-->>TUI: Return file list
    TUI-->>Agent: Display categorized file tree

    Agent->>TUI: Filter by category: "documentation"
    TUI-->>Agent: Show 4 docs (README, CODEBASE_OVERVIEW, etc.)

    Agent->>TUI: Select "CODEBASE_OVERVIEW.md"
    TUI-->>Agent: Show file details:
    Note over TUI: file_id: SOM-DOC-0003-v1.0.0
    Note over TUI: description: Complete codebase overview
    Note over TUI: tags: [handoff, architecture, overview]

    Agent->>TUI: Press 'o' to open in editor
    TUI->>FS: Open file in $EDITOR

    Agent->>CLI: ghost-catalog search "proxy"
    CLI-->>Agent: Found 3 files with "proxy" in description

    Agent->>CLI: ghost-catalog info SOM-SCR-0012-v1.1.0
    CLI-->>Agent: Show full metadata for proxy/core.py

    Agent->>Agent: Now understands project structure
Loading

Outcome: Agent quickly orients itself using catalog metadata instead of reading every file.


Use Case 2: Dependency Analysis

Scenario: Agent needs to understand which files depend on db_handler.py before refactoring it.

Workflow:

# Step 1: Find db_handler file ID
$ ghost-catalog search "db_handler"
SOM-SCR-XXXX-v1.0.0  ghost_shell/data/db_handler.py  Unified database handler

# Step 2: Query dependencies (reverse lookup)
$ sqlite3 data/catalog.db
sqlite> SELECT fc.file_id, fc.name
        FROM file_dependencies fd
        JOIN file_catalog fc ON fd.file_id = fc.file_id
        WHERE fd.depends_on_file_id = 'SOM-SCR-XXXX-v1.0.0';

# Results:
# SOM-SCR-0012-v1.1.0  core.py
# SOM-SCR-0013-v1.0.0  collector.py
# SOM-SCR-0014-v1.0.0  cli.py

# Step 3: View dependency graph in TUI
$ ghost-catalog-tui
# Press 'd' on db_handler.py → shows visual dependency tree

Outcome: Agent knows exactly which files will be affected by changes, reducing refactoring risk.


Use Case 3: Finding Stale Documentation

Scenario: Developer wants to find outdated documentation files that haven't been updated in 6 months.

Workflow:

-- Query catalog database
SELECT
    file_id,
    name,
    description,
    modified,
    julianday('now') - julianday(modified) as days_old
FROM file_catalog
WHERE category = 'documentation'
  AND days_old > 180
ORDER BY days_old DESC;

-- Results:
-- SOM-DOC-0001-v1.0.0  OLD_SETUP.md  Installation guide  2024-03-15  245 days
# Using CLI
$ ghost-catalog list --category documentation --sort modified

# Output shows oldest docs at bottom
# Developer can then update or archive them

Outcome: Easy identification of documentation needing updates.


Use Case 4: Tag-Based Code Discovery

Scenario: New developer wants to find all files related to OpenTelemetry instrumentation.

Workflow:

# Using CLI
$ ghost-catalog search --tag opentelemetry

# Results:
# SOM-SCR-0010-v1.0.0  telemetry.py        OpenTelemetry setup
# SOM-SCR-0012-v1.1.0  core.py             Main proxy addon
# SOM-SCR-0013-v1.0.0  collector.py        Intelligence collector
# SOM-SCR-0014-v1.0.0  cli.py              Management CLI

# Open all in editor
$ ghost-catalog search --tag opentelemetry --format json | jq -r '.[].path' | xargs code

Outcome: Developer instantly finds all relevant files without grep-ing or manual exploration.


Use Case 5: Version Audit

Scenario: QA team needs to verify all files are at v1.0.0 or higher before release.

Workflow:

# Validate version compliance
$ ghost-catalog validate --strict

# Output:
# ✗ SOM-SCR-0008-v0.9.5  old_module.py
#   └─ Version below 1.0.0 (v0.9.5)
# ✗ SOM-DOC-0002-v0.5.0  DRAFT_SPEC.md
#   └─ Version below 1.0.0 (v0.5.0)

# Fix by updating files and bumping versions
$ ghost-catalog generate --file ghost_shell/old_module.py --category script --version 1.0.0

Outcome: Automated version compliance checking for release readiness.


Integration Guide

Integrating with Existing Projects

Step 1: Audit Current Files

# List all Python/Markdown files without headers
find . -name "*.py" -o -name "*.md" | while read file; do
    if ! grep -q "file_id:" "$file"; then
        echo "$file"
    fi
done > files_without_headers.txt

Step 2: Generate Headers

# bulk_add_headers.py
import sys
from pathlib import Path

def generate_header(filepath, category, description):
    """Generate catalog header for a file."""
    # Determine next sequence number
    # ... (use logic from "UUID Generation Workflow")

    # Read existing content
    with open(filepath, 'r') as f:
        existing_content = f.read()

    # Generate header
    header = f"""# ==============================================================================
# file_id: {file_id}
# name: {filepath.name}
# description: {description}
# project_id: {project_id}
# category: {category}
# tags: []
# created: {today}
# modified: {today}
# version: 1.0.0
# agent_id: {agent_id}
# execution: python {filepath}
# ==============================================================================

"""

    # Write with header
    with open(filepath, 'w') as f:
        f.write(header + existing_content)

# Process all files
for filepath in Path('.').rglob('*.py'):
    if not has_header(filepath):
        generate_header(filepath, 'script', 'TODO: Add description')

Step 3: Build Catalog Database

# Create and populate catalog
ghost-catalog sync --path .

Step 4: Validate

ghost-catalog validate --fix

Integration with Git Workflows

Pre-Commit Hook:

#!/bin/bash
# .git/hooks/pre-commit
# Validate catalog headers before commit

echo "Validating catalog headers..."

# Run validation
ghost-catalog validate --strict

if [ $? -ne 0 ]; then
    echo "❌ Catalog validation failed. Fix errors before committing."
    exit 1
fi

echo "✓ Catalog validation passed"
exit 0

Post-Merge Hook:

#!/bin/bash
# .git/hooks/post-merge
# Re-sync catalog after merges

echo "Re-syncing catalog database..."
ghost-catalog sync
echo "✓ Catalog synced"

Integration with CI/CD

GitHub Actions (.github/workflows/catalog-check.yml):

name: Catalog Validation

on:
  pull_request:
    paths:
      - '**/*.py'
      - '**/*.md'
      - '**/*.yaml'

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install ghost-catalog
        run: |
          # Install CLI tool
          go install github.com/somacosf/ghost-catalog@latest

      - name: Validate catalog headers
        run: |
          ghost-catalog validate --strict

      - name: Check for missing headers
        run: |
          ghost-catalog sync --dry-run
          if [ -n "$(ghost-catalog sync --dry-run | grep 'Missing header')" ]; then
            echo "❌ Some files are missing catalog headers"
            exit 1
          fi

Future Enhancements

1. Automated Dependency Detection

Goal: Automatically detect imports and build dependency graph.

Implementation:

  • Parse Python import statements
  • Parse Markdown file links
  • Build file_dependencies table

Example:

# auto_deps.py
import ast

def extract_imports(filepath):
    with open(filepath) as f:
        tree = ast.parse(f.read())

    imports = []
    for node in ast.walk(tree):
        if isinstance(node, ast.Import):
            for alias in node.names:
                imports.append(alias.name)
        elif isinstance(node, ast.ImportFrom):
            imports.append(node.module)

    return imports

# Map module names to file_ids
# Insert into file_dependencies table

2. AI-Powered Tagging

Goal: Use LLM to suggest tags based on file content.

Workflow:

sequenceDiagram
    participant CLI as ghost-catalog
    participant LLM as Claude API
    participant DB as Catalog DB

    CLI->>DB: Get files without tags
    DB-->>CLI: Return file list

    loop For each file
        CLI->>LLM: Analyze file content, suggest tags
        LLM-->>CLI: Return suggested tags: [cli, admin, sqlite]
        CLI->>User: Show suggestions, confirm?
        User-->>CLI: Approve tags
        CLI->>DB: Update file_tags table
    end
Loading

3. Visual Dependency Graph

Goal: Generate visual diagrams of file relationships.

Tools: Graphviz, Mermaid, or D3.js

Example Output:

graph LR
    A[cli.py] --> B[db_handler.py]
    C[core.py] --> B
    D[collector.py] --> B
    E[launcher.py] --> A
    E --> C
Loading

4. Change Impact Analysis

Goal: Predict which files will be affected by a change.

Query:

-- Find all files that transitively depend on db_handler.py
WITH RECURSIVE impact AS (
    SELECT file_id, 1 as depth
    FROM file_catalog
    WHERE file_id = 'SOM-SCR-XXXX-v1.0.0'

    UNION ALL

    SELECT fd.file_id, impact.depth + 1
    FROM file_dependencies fd
    JOIN impact ON fd.depends_on_file_id = impact.file_id
    WHERE impact.depth < 10
)
SELECT DISTINCT fc.file_id, fc.name
FROM impact
JOIN file_catalog fc ON impact.file_id = fc.file_id;

5. Workspace-Level Registry

Goal: Single catalog database for all Somacosf projects.

Schema Extension:

CREATE TABLE projects (
    project_id TEXT PRIMARY KEY,
    name TEXT,
    path TEXT,
    description TEXT
);

ALTER TABLE file_catalog ADD COLUMN workspace_id TEXT;

Benefits:

  • Search across all projects
  • Track agent activity workspace-wide
  • Identify duplicate/similar files

Conclusion

The Ghost_Shell File ID System provides a semantic, self-documenting catalog framework that enables:

  1. Rapid Onboarding: New agents can understand codebases via metadata
  2. Efficient Search: Find files by category, tags, or relationships
  3. Version Tracking: Built-in semver for all files
  4. Agent Coordination: Track which agent created/modified each file
  5. Future-Ready: Designed for database integration and advanced tooling

Next Steps:

  1. Implement ghost-catalog CLI tool (Go or Python)
  2. Build Bubble Tea TUI for interactive browsing
  3. Create catalog database and sync tool
  4. Integrate with development workflows (Git hooks, CI/CD)

Key Takeaway: Unlike UUIDs (which prioritize uniqueness), SOM File IDs prioritize discoverability and semantic richness, making them ideal for AI-agent-driven development workspaces.


Document Version: 1.0.0 Created: 2025-01-24 Agent: Claude (Sonnet 4.5) Project: Ghost_Shell / Somacosf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment