Author: Claude (Sonnet 4.5) Date: 2025-01-24 Project: Ghost_Shell / Somacosf Workspace Purpose: Comprehensive documentation of the semantic file catalog system used across the Somacosf ecosystem
- Executive Summary
- System Architecture Overview
- File ID Schema Specification
- Header Format Standards
- UUID Generation Workflow
- Data Schema Deep Dive
- Registry System Architecture
- Access & Browse Methods
- Bubble Tea TUI Design
- Database Schema for File Tracking
- CLI Interface Specifications
- Implementation Examples
- Use Cases & Narratives
- Integration Guide
- Future Enhancements
TL;DR: Ghost_Shell does NOT use traditional UUIDs (UUID4, UUID5). Instead, it implements a semantic file catalog system called SOM (Somacosf) that embeds rich metadata directly into file headers.
Key Characteristics:
- Semantic IDs: Human-readable identifiers like
SOM-SCR-0014-v1.0.0 - Embedded Metadata: Every file contains a structured header with 12+ metadata fields
- Workspace-Level Standard: Defined in
CLAUDE.mdand enforced across all Somacosf projects - No Separate Registry Database (currently): File IDs are distributed in file headers
- Future Integration: Designed to work with tosijs-schema for runtime validation
| Traditional UUIDs | SOM File IDs |
|---|---|
| Cryptographic randomness | Semantic structure |
| No human meaning | Category-based, sequential |
| Requires external registry | Self-documenting in file |
| Version-agnostic | Built-in semver tracking |
550e8400-e29b-41d4-a716-446655440000 |
SOM-SCR-0014-v1.0.0 |
Decision Rationale:
- UUIDs are excellent for distributed systems requiring uniqueness guarantees
- SOM IDs prioritize discoverability, traceability, and AI agent coordination
- The workspace operates with a single coordinating agent, making sequential IDs viable
- Semantic structure enables pattern-based queries and category filtering
graph TB
subgraph "File System Layer"
A[Python Files] -->|Header| B[SOM File ID]
C[Markdown Files] -->|Header| B
D[Config Files] -->|Header| B
E[Scripts] -->|Header| B
end
subgraph "Metadata Schema"
B --> F[file_id: SOM-XXX-NNNN-vX.X.X]
B --> G[Agent ID: AGENT-XXX-NNN]
B --> H[Tags: Array]
B --> I[Timestamps]
B --> J[Execution Info]
end
subgraph "Registry System (Planned)"
F --> K[SQLite Catalog DB]
G --> L[Agent Registry]
H --> M[Tag Index]
K --> N[File Lookup API]
L --> N
M --> N
end
subgraph "Access Layer"
N --> O[CLI Commands]
N --> P[Bubble Tea TUI]
N --> Q[tosijs Schema Validation]
end
subgraph "Integration Layer"
O --> R[Search by Category]
P --> S[Browse by Tags]
Q --> T[Validate File Metadata]
end
- File System Layer: Physical files with embedded headers
- Metadata Schema: Structured data within headers (current implementation)
- Registry System: Planned centralized database for fast queries
- Access Layer: Tools and interfaces to interact with file catalog
- Integration Layer: Cross-project features and validations
SOM-<CATEGORY>-<SEQUENCE>-v<VERSION>
Where:
- SOM = Somacosf namespace (3 chars, fixed)
- CATEGORY = File type code (3 chars, uppercase)
- SEQUENCE = Unique number (4 digits, zero-padded)
- VERSION = Semantic version (semver: MAJOR.MINOR.PATCH)
| Code | Category | Description | Example File Types |
|---|---|---|---|
| CMD | Slash commands | Claude Code slash commands | .md files in .claude/commands/ |
| SCR | Scripts | Executable Python/PowerShell | .py, .ps1 |
| DOC | Documentation | Project documentation | .md (README, guides) |
| CFG | Configuration | Config files | .yaml, .json, .toml |
| REG | Registry files | Registry-related files | Registry schemas, catalogs |
| TST | Tests | Test suites | test_*.py, *_test.py |
| TMP | Templates | Project templates | Boilerplate files |
| DTA | Data/schemas | Database schemas, data | .sql, .json schemas |
| LOG | Logs/diaries | Development logs | development_diary.md |
Strategy: Sequential allocation within category scope
Rules:
- Sequences start at
0001for each category - Numbers are never reused (even after file deletion)
- Allocation is manual (no auto-increment system yet)
- Gaps in sequences are acceptable (e.g., 0001, 0002, 0005)
Example Progression:
SOM-SCR-0001-v1.0.0 # First script
SOM-SCR-0002-v1.0.0 # Second script
SOM-SCR-0003-v1.0.0 # Third script (later deleted)
SOM-SCR-0004-v1.0.0 # Fourth script
# Next allocation: SOM-SCR-0005 (0003 is NOT reused)
Following Semantic Versioning 2.0.0 (semver.org):
MAJOR.MINOR.PATCH
MAJOR: Incompatible API changes, breaking refactors
MINOR: New features, backward-compatible additions
PATCH: Bug fixes, documentation updates
Version Bump Examples:
- File header metadata added → PATCH bump (
v1.0.0→v1.0.1) - New function added to module → MINOR bump (
v1.0.1→v1.1.0) - Function signature changed → MAJOR bump (
v1.1.0→v2.0.0)
Version in File ID:
- The version in the
file_idline should match theversionfield - Both should be updated simultaneously when file is modified
# ==============================================================================
# file_id: SOM-SCR-0014-v1.0.0
# name: cli.py
# description: Ghost_Shell unified management CLI
# project_id: GHOST-SHELL
# category: script
# tags: [cli, management, admin, opentelemetry]
# created: 2025-11-23
# modified: 2025-11-23
# version: 1.0.0
# agent_id: AGENT-CLAUDE-002
# execution: python -m ghost_shell.cli [command] [args]
# ==============================================================================Location: Lines 1-13 of every Python file Format: Hash comments with key: value pairs Line Count: Exactly 13 lines (including delimiters)
<!--
===============================================================================
file_id: SOM-DOC-0010-v1.0.0
name: TOSIJS_INTEGRATION.md
description: tosijs-schema integration plan for Ghost_Shell
project_id: GHOST-SHELL
category: documentation
tags: [tosijs, schema, validation, integration, planning]
created: 2025-11-23
modified: 2025-11-23
version: 1.0.0
agent:
id: AGENT-CLAUDE-002
name: claude_takeover
model: claude-sonnet-4-5-20250929
execution:
type: documentation
invocation: Read for project understanding
===============================================================================
-->Location: Lines 1-20 of every Markdown file Format: HTML comment block with YAML-like structure Line Count: 19-21 lines (variable based on nested fields)
Note: Markdown headers use nested YAML syntax for agent and execution fields.
# ==============================================================================
# file_id: SOM-CFG-0002-v1.0.0
# name: config.yaml
# description: Ghost_Shell unified configuration
# project_id: GHOST-SHELL
# category: configuration
# tags: [config, yaml, settings]
# created: 2025-11-23
# modified: 2025-11-23
# version: 1.0.0
# ==============================================================================Location: Lines 1-11 Format: YAML comments (simpler than Markdown, no nested fields) Line Count: 11 lines
# ==============================================================================
# file_id: SOM-SCR-NNNN-vX.X.X
# name: script.ps1
# description: What this script does
# project_id: PROJECT-TYPE
# category: script
# tags: [tag1, tag2]
# created: YYYY-MM-DD
# modified: YYYY-MM-DD
# version: X.X.X
# agent_id: AGENT-XXX-NNN
# execution: .\script.ps1 [-Param value]
# ==============================================================================Same structure as Python, but uses PowerShell comment syntax.
Workflow Diagram:
sequenceDiagram
participant Agent as Claude Agent
participant FS as File System
participant Dev as Development Diary
Agent->>Agent: 1. Determine file purpose
Agent->>Agent: 2. Select category code (SCR, DOC, CFG, etc.)
Agent->>FS: 3. Search existing files for highest sequence
Note over FS: grep "file_id: SOM-SCR" *.py | sort
FS-->>Agent: 4. Return highest sequence (e.g., 0013)
Agent->>Agent: 5. Increment sequence (0013 → 0014)
Agent->>Agent: 6. Set version to v1.0.0 (new file)
Agent->>Agent: 7. Assign agent_id (AGENT-CLAUDE-002)
Agent->>Agent: 8. Generate file_id: SOM-SCR-0014-v1.0.0
Agent->>FS: 9. Create file with header
Agent->>Dev: 10. Log file creation in diary
Step-by-Step Process:
-
Determine Category
# Is this a script? → SCR # Is this documentation? → DOC # Is this a test? → TST
-
Find Highest Sequence in Category
# For scripts: grep -r "file_id: SOM-SCR" . | sed 's/.*SOM-SCR-\([0-9]*\).*/\1/' | sort -n | tail -1 # Output: 0013
-
Increment Sequence
last_sequence = 13 new_sequence = last_sequence + 1 # 14 formatted = f"{new_sequence:04d}" # "0014"
-
Construct File ID
file_id = f"SOM-SCR-{formatted}-v1.0.0" # Result: SOM-SCR-0014-v1.0.0
-
Generate Complete Header
header = f"""# ==============================================================================
"""
6. **Write File**
```python
with open(filepath, 'w') as f:
f.write(header)
f.write(code_content)
graph LR
A[Create New File] --> B{File Type?}
B -->|.py| C[Scan Python Files]
B -->|.md| D[Scan Markdown Files]
B -->|.yaml| E[Scan Config Files]
C --> F[Get Highest SCR Sequence]
D --> G[Get Highest DOC Sequence]
E --> H[Get Highest CFG Sequence]
F --> I[Increment & Format]
G --> I
H --> I
I --> J[Query Agent Registry]
J --> K[Get Current Agent ID]
K --> L[Generate File ID]
L --> M[Create Header Template]
M --> N[Insert into File]
N --> O[Register in Catalog DB]
O --> P[Log to Diary]
Automation Tool (Proposed):
# Command-line tool for ID generation
ghost-catalog generate --file new_module.py --category script --description "New feature module"
# Output:
# Generated: SOM-SCR-0015-v1.0.0
# Header inserted into new_module.py
# Registered in catalog database| Field | Type | Required | Description | Example |
|---|---|---|---|---|
| file_id | String | Yes | Unique semantic identifier | SOM-SCR-0014-v1.0.0 |
| name | String | Yes | Actual filename (must match) | cli.py |
| description | String | Yes | One-line purpose statement | Ghost_Shell unified management CLI |
| project_id | String | Yes | Parent project identifier | GHOST-SHELL |
| category | String | Yes | Category code (3 chars) | script |
| tags | Array | Yes | Semantic tags for indexing | [cli, management, admin] |
| created | Date | Yes | ISO 8601 date (YYYY-MM-DD) | 2025-11-23 |
| modified | Date | Yes | Last modification date | 2025-11-23 |
| version | String | Yes | Semantic version | 1.0.0 |
| agent_id | String | Yes | Creating/modifying agent | AGENT-CLAUDE-002 |
| agent.name | String | Optional | Agent name (MD only) | claude_takeover |
| agent.model | String | Optional | LLM model (MD only) | claude-sonnet-4-5-20250929 |
| execution | String | Yes | How to run/use file | python -m ghost_shell.cli stats |
file_id:
- Pattern:
^SOM-[A-Z]{3}-\d{4}-v\d+\.\d+\.\d+$ - Category: Must match one of the defined category codes
- Sequence: Must be unique within category
- Version: Must match
versionfield
name:
- Constraint: Must exactly match the filename
- Case-sensitive:
CLI.py≠cli.py - Extension: Must include file extension
description:
- Length: 1-100 characters recommended
- Style: Imperative mood, concise
- Good:
Unified management CLI for Ghost_Shell - Bad:
This file contains a CLI that manages things
project_id:
- Format:
[A-Z-]+(uppercase with hyphens) - Examples:
GHOST-SHELL,DMBT,BROWSER-MIXER - Scope: Project-level identifier (not workspace)
category:
- Values: One of
script,documentation,configuration,test,template,data,log - Consistency: Should align with file_id category code
tags:
- Format: Array of lowercase strings
- Separator: Commas in array notation
[tag1, tag2, tag3] - Style: Use hyphens for multi-word tags:
multi-word-tag - Purpose: Enable semantic search and classification
created / modified:
- Format:
YYYY-MM-DD(ISO 8601 date only, no time) - created: Never changes after initial creation
- modified: Updated whenever file content changes
version:
- Format:
MAJOR.MINOR.PATCH(semver) - Constraints: Must not have leading zeros (e.g.,
1.01.0is invalid) - Sync: Must match version in
file_id
agent_id:
- Format:
AGENT-[A-Z]+-\d{3} - Registry: Should reference a defined agent in
agent_registry.md - Examples:
AGENT-CLAUDE-002,AGENT-PRIME-001
execution:
- Purpose: Provides runnable command or usage instruction
- Format: Actual shell command or description
- Examples:
- Scripts:
python -m ghost_shell.cli stats - Docs:
Read for project understanding - Configs:
Loaded by launcher.py on startup
- Scripts:
Recommended Tag Categories:
-
Functional Tags: What the file does
cli,gui,tui,api,database,networking
-
Technology Tags: What it uses
opentelemetry,sqlite,mitmproxy,textual,tosijs
-
Purpose Tags: Why it exists
admin,debugging,monitoring,security,privacy
-
Domain Tags: What area it covers
proxy,firewall,intelligence,analytics,telemetry
-
Status Tags: Implementation state
wip,deprecated,experimental,stable
Example Tagging:
# For ghost_shell/cli.py:
tags: [cli, management, admin, opentelemetry, statistics]
# For ghost_shell/proxy/blocker.py:
tags: [proxy, blocking, privacy, security, mitmproxy]
# For development_diary.md:
tags: [log, diary, development, documentation]Architecture:
graph TB
subgraph "File System (Current)"
A[cli.py<br/>SOM-SCR-0014]
B[launcher.py<br/>SOM-SCR-0015]
C[telemetry.py<br/>SOM-SCR-0010]
D[config.yaml<br/>SOM-CFG-0002]
end
subgraph "Query Methods"
E[grep + awk + sort]
F[repomix output]
G[Manual file inspection]
end
A --> E
B --> E
C --> E
D --> E
A --> F
B --> F
C --> F
D --> F
A --> G
B --> G
C --> G
D --> G
E --> H[File ID List]
F --> H
G --> H
Characteristics:
- ✅ No database dependency: Works immediately
- ✅ Git-trackable: Headers are version-controlled
- ✅ Self-documenting: Metadata travels with file
- ❌ Slow queries: Must scan all files for searches
- ❌ No indexing: Can't efficiently search by tags
- ❌ No relationships: Can't link files (dependencies, etc.)
Architecture:
graph TB
subgraph "File System"
A[Files with Headers]
end
subgraph "Catalog Database (SQLite)"
B[(file_catalog table)]
C[(agent_registry table)]
D[(file_tags table)]
E[(file_dependencies table)]
end
subgraph "Sync Layer"
F[Catalog Scanner]
G[File Watcher]
end
subgraph "Access APIs"
H[CLI: ghost-catalog]
I[TUI: Bubble Tea Browser]
J[Python API: CatalogQuery]
end
A --> F
F --> B
F --> C
F --> D
A --> G
G --> B
B --> H
C --> H
D --> H
B --> I
C --> I
D --> I
B --> J
C --> J
D --> J
Database Schema (Proposed):
-- Core catalog table
CREATE TABLE file_catalog (
file_id TEXT PRIMARY KEY, -- SOM-XXX-NNNN-vX.X.X
name TEXT NOT NULL, -- Filename
path TEXT NOT NULL, -- Relative or absolute path
description TEXT,
project_id TEXT,
category TEXT, -- SOM category code
created DATE,
modified DATE,
version TEXT, -- Semantic version
agent_id TEXT,
execution TEXT,
checksum TEXT, -- SHA256 hash for integrity
last_scanned TIMESTAMP, -- When catalog was last updated
FOREIGN KEY (agent_id) REFERENCES agent_registry(id)
);
-- Tag index (many-to-many)
CREATE TABLE file_tags (
file_id TEXT,
tag TEXT,
PRIMARY KEY (file_id, tag),
FOREIGN KEY (file_id) REFERENCES file_catalog(file_id)
);
-- Agent registry
CREATE TABLE agent_registry (
id TEXT PRIMARY KEY, -- AGENT-XXX-NNN
name TEXT,
model TEXT,
first_seen TIMESTAMP,
last_active TIMESTAMP
);
-- File dependencies (optional, for advanced tracking)
CREATE TABLE file_dependencies (
file_id TEXT,
depends_on_file_id TEXT,
dependency_type TEXT, -- import, config, data, etc.
PRIMARY KEY (file_id, depends_on_file_id),
FOREIGN KEY (file_id) REFERENCES file_catalog(file_id),
FOREIGN KEY (depends_on_file_id) REFERENCES file_catalog(file_id)
);
-- Indexes for fast queries
CREATE INDEX idx_category ON file_catalog(category);
CREATE INDEX idx_project ON file_catalog(project_id);
CREATE INDEX idx_agent ON file_catalog(agent_id);
CREATE INDEX idx_modified ON file_catalog(modified);
CREATE INDEX idx_tags ON file_tags(tag);Process:
sequenceDiagram
participant FS as File System
participant Scanner as Catalog Scanner
participant DB as Catalog Database
participant Log as Sync Log
Scanner->>FS: 1. Glob all files (**/*.py, **/*.md, etc.)
FS-->>Scanner: 2. Return file list
loop For each file
Scanner->>FS: 3. Read file header (first 20 lines)
FS-->>Scanner: 4. Return header content
Scanner->>Scanner: 5. Parse header metadata
Scanner->>Scanner: 6. Calculate SHA256 checksum
Scanner->>DB: 7. SELECT checksum WHERE file_id = ?
alt File exists and unchanged
DB-->>Scanner: Checksum matches
Scanner->>Scanner: Skip (no update needed)
else File new or modified
DB-->>Scanner: Checksum differs or NULL
Scanner->>DB: 8. UPSERT file_catalog
Scanner->>DB: 9. DELETE old tags, INSERT new tags
Scanner->>Log: 10. Log update
end
end
Scanner->>DB: 11. Mark stale entries (files deleted)
Scanner->>Log: 12. Generate sync report
Command:
# Scan and update catalog
ghost-catalog sync
# Output:
# Scanned: 50 files
# Updated: 3 files
# New: 1 file
# Unchanged: 46 files
# Stale: 0 files
# Sync completed in 1.2sBash Commands:
# List all file IDs
grep -r "file_id:" . --include="*.py" --include="*.md" | awk '{print $2}'
# List by category
grep -r "file_id: SOM-SCR" . --include="*.py" | awk '{print $2}'
# Find file by ID
grep -r "file_id: SOM-SCR-0014" .
# List all tags
grep -r "tags:" . --include="*.py" --include="*.md" | sed 's/.*tags: //' | tr '[],' '\n' | sort -u
# Find files with specific tag
grep -r "tags:.*opentelemetry" . --include="*.py"PowerShell Commands:
# List all file IDs
Select-String -Path .\**\*.py,.\**\*.md -Pattern "file_id:" | ForEach-Object { ($_ -split ' ')[1] }
# Count files by category
Select-String -Path .\**\*.py -Pattern "file_id:" |
ForEach-Object { ($_ -split '-')[1] } |
Group-Object |
Select-Object Name, Count
# Find newest modified files
Select-String -Path .\**\*.py,.\**\*.md -Pattern "modified:" |
ForEach-Object {
$date = ($_ -split ' ')[1]
$file = $_.Path
[PSCustomObject]@{Date=$date; File=$file}
} |
Sort-Object -Property Date -Descending |
Select-Object -First 10Generate Repomix:
repomix --output catalog.txt --style xmlParse Repomix:
import re
def parse_file_ids_from_repomix(repomix_path):
"""Extract all file IDs from repomix output"""
pattern = r'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)'
with open(repomix_path, 'r') as f:
content = f.read()
matches = re.findall(pattern, content)
return list(set(matches)) # Unique file IDs
# Usage
file_ids = parse_file_ids_from_repomix('ghost_shell_repomix_output.txt')
print(f"Found {len(file_ids)} files with catalog headers")Command Reference:
# List all cataloged files
ghost-catalog list
# Filter by category
ghost-catalog list --category script
# Search by tag
ghost-catalog search --tag opentelemetry
# Get file details
ghost-catalog info SOM-SCR-0014-v1.0.0
# Validate all headers
ghost-catalog validate
# Fix inconsistencies
ghost-catalog validate --fix
# Generate new file with header
ghost-catalog generate --file new_module.py --category script --description "New module"
# Show catalog statistics
ghost-catalog stats
# Export catalog to JSON
ghost-catalog export catalog.jsonExample Output:
$ ghost-catalog list --category script
╭─────────────────────────────────────────────────────────────╮
│ Ghost_Shell File Catalog (Scripts) │
├──────────────────────────┬──────────────────────────────────┤
│ File ID │ Description │
├──────────────────────────┼──────────────────────────────────┤
│ SOM-SCR-0010-v1.0.0 │ OpenTelemetry setup │
│ SOM-SCR-0011-v1.0.0 │ Traffic blocking │
│ SOM-SCR-0012-v1.1.0 │ Main proxy addon │
│ SOM-SCR-0013-v1.0.0 │ Intelligence collector │
│ SOM-SCR-0014-v1.0.0 │ Management CLI │
│ SOM-SCR-0015-v1.1.0 │ Orchestrator │
├──────────────────────────┴──────────────────────────────────┤
│ Total: 6 scripts │
╰─────────────────────────────────────────────────────────────╯
-- Find all files modified in last 7 days
SELECT file_id, name, modified
FROM file_catalog
WHERE modified >= date('now', '-7 days')
ORDER BY modified DESC;
-- Count files by category
SELECT category, COUNT(*) as count
FROM file_catalog
GROUP BY category
ORDER BY count DESC;
-- Find files by tag
SELECT fc.file_id, fc.name, fc.description
FROM file_catalog fc
JOIN file_tags ft ON fc.file_id = ft.file_id
WHERE ft.tag = 'opentelemetry';
-- Find files by agent
SELECT file_id, name, version
FROM file_catalog
WHERE agent_id = 'AGENT-CLAUDE-002'
ORDER BY modified DESC;
-- Find outdated versions (MAJOR version < 1)
SELECT file_id, name, version
FROM file_catalog
WHERE CAST(substr(version, 1, instr(version, '.') - 1) AS INTEGER) < 1;
-- File dependency graph
WITH RECURSIVE deps AS (
SELECT file_id, depends_on_file_id, 1 as depth
FROM file_dependencies
WHERE file_id = 'SOM-SCR-0014-v1.0.0'
UNION ALL
SELECT fd.file_id, fd.depends_on_file_id, deps.depth + 1
FROM file_dependencies fd
JOIN deps ON fd.file_id = deps.depends_on_file_id
WHERE depth < 5
)
SELECT DISTINCT fc.file_id, fc.name, deps.depth
FROM deps
JOIN file_catalog fc ON deps.depends_on_file_id = fc.file_id
ORDER BY depth, fc.file_id;Bubble Tea (https://github.com/charmbracelet/bubbletea) is a Go framework for building terminal UIs. For Ghost_Shell, we'll design a TUI for browsing the file catalog.
Why Bubble Tea?
- Modern, composable TUI framework
- Rich text rendering (via Lip Gloss)
- Mouse support, interactive tables
- Cross-platform (Windows, Linux, macOS)
Implementation Language: Go (Bubble Tea is Go-native)
graph TB
subgraph "Bubble Tea TUI"
A[Main View] --> B[Navigation Sidebar]
A --> C[Content Pane]
B --> D[Category Filter]
B --> E[Tag Filter]
B --> F[Search Box]
C --> G[File List Table]
C --> H[File Detail View]
C --> I[Dependency Graph]
end
subgraph "Data Layer"
J[SQLite Catalog DB]
K[File System]
end
G --> J
H --> K
I --> J
╔════════════════════════════════════════════════════════════════════════════╗
║ Ghost_Shell File Catalog Browser v1.0 ║
╠══════════════════╦═════════════════════════════════════════════════════════╣
║ FILTERS ║ FILE LIST (25 files) ║
║ ║ ┌────────────────────┬────────────────────────────────┐ ║
║ Categories ║ │ File ID │ Description │ ║
║ [●] Scripts (6) ║ ├────────────────────┼────────────────────────────────┤ ║
║ [ ] Docs (4) ║ │ SOM-SCR-0010-v1.0.0│ OpenTelemetry setup │ ║
║ [ ] Config (2) ║ │ SOM-SCR-0011-v1.0.0│ Traffic blocking │ ║
║ [ ] Tests (1) ║ │►SOM-SCR-0012-v1.1.0│ Main proxy addon │ ║
║ ║ │ SOM-SCR-0013-v1.0.0│ Intelligence collector │ ║
║ Tags ║ │ SOM-SCR-0014-v1.0.0│ Management CLI │ ║
║ [x] opentelemetry║ │ SOM-SCR-0015-v1.1.0│ Orchestrator │ ║
║ [ ] proxy ║ └────────────────────┴────────────────────────────────┘ ║
║ [ ] cli ║ ║
║ ║ FILE DETAILS: SOM-SCR-0012-v1.1.0 ║
║ Search ║ ┌─────────────────────────────────────────────────────┐ ║
║ [proxy______] ║ │ Name: core.py │ ║
║ ║ │ Path: ghost_shell/proxy/core.py │ ║
║ Sort By ║ │ Description: Main proxy addon │ ║
║ ● Modified ║ │ Project: GHOST-SHELL │ ║
║ ○ Created ║ │ Category: script │ ║
║ ○ File ID ║ │ Tags: [proxy, mitmproxy, core] │ ║
║ ║ │ Version: 1.1.0 │ ║
║ Actions ║ │ Agent: AGENT-CLAUDE-002 │ ║
║ [o] Open in $ED ║ │ Created: 2025-11-23 Modified: 2025-11-23 │ ║
║ [c] Copy path ║ │ Execution: python -s ghost_shell/proxy/core.py │ ║
║ [d] Dependencies ║ │ Checksum: 3a5f7c... (verified) │ ║
║ [g] Git log ║ └─────────────────────────────────────────────────────┘ ║
║ ║ ║
╠══════════════════╩═════════════════════════════════════════════════════════╣
║ [?] Help [/] Search [Tab] Switch Pane [q] Quit [Enter] Select ║
╚════════════════════════════════════════════════════════════════════════════╝
1. Navigation Sidebar (Left, 25% width)
- Category Filters: Checkboxes for SCR, DOC, CFG, TST
- Tag Filters: Checkboxes for common tags (top 10)
- Search Box: Live filter by file_id, name, or description
- Sort Options: Radio buttons for sort order
- Action Buttons: Quick actions (open, copy, etc.)
2. File List Table (Top-right, 75% width, 60% height)
- Columns: File ID, Description (truncated if needed)
- Selection: Arrow keys + Enter to select
- Highlight: Selected row highlighted
- Pagination: If >20 files, show page indicator
3. File Detail View (Bottom-right, 75% width, 40% height)
- Metadata Display: All header fields in readable format
- Checksum Verification: Shows if file matches catalog
- Action Hints: Keybindings for common actions
4. Status Bar (Bottom)
- Keybindings: Quick reference for navigation
- Status: Current filter/search state
| Key | Action | Description |
|---|---|---|
↑/↓ |
Navigate list | Move selection up/down |
Enter |
Select file | Show details in detail pane |
Tab |
Switch pane | Toggle between sidebar and file list |
/ |
Search | Focus search box |
Esc |
Clear filter | Reset all filters |
o |
Open in editor | Open file in $EDITOR |
c |
Copy path | Copy file path to clipboard |
d |
Show dependencies | Open dependency graph view |
g |
Git log | Show git history for file |
v |
Validate | Re-validate file header |
r |
Refresh | Re-scan catalog database |
? |
Help | Show full help screen |
q |
Quit | Exit application |
File Structure:
ghost-catalog-tui/
├── main.go # Entry point
├── model.go # Bubble Tea model
├── update.go # Update logic
├── view.go # Rendering logic
├── components/
│ ├── sidebar.go # Sidebar component
│ ├── filelist.go # File list table
│ ├── details.go # Detail pane
│ └── statusbar.go # Status bar
├── database/
│ └── catalog.go # SQLite queries
└── styles/
└── theme.go # Lip Gloss styles
Sample Code (main.go):
package main
import (
"database/sql"
"fmt"
"log"
"os"
tea "github.com/charmbracelet/bubbletea"
_ "github.com/mattn/go-sqlite3"
)
type model struct {
db *sql.DB
files []File
selectedIndex int
filterCategory string
filterTags []string
searchQuery string
focusedPane string // "sidebar" or "filelist"
}
type File struct {
FileID string
Name string
Description string
Path string
Category string
Tags []string
Version string
Modified string
}
func initialModel() model {
db, err := sql.Open("sqlite3", "./data/catalog.db")
if err != nil {
log.Fatal(err)
}
return model{
db: db,
focusedPane: "filelist",
}
}
func (m model) Init() tea.Cmd {
return loadFiles(m.db)
}
func (m model) Update(msg tea.Msg) (tea.Model, tea.Cmd) {
switch msg := msg.(type) {
case tea.KeyMsg:
switch msg.String() {
case "q", "ctrl+c":
return m, tea.Quit
case "up":
if m.selectedIndex > 0 {
m.selectedIndex--
}
case "down":
if m.selectedIndex < len(m.files)-1 {
m.selectedIndex++
}
case "enter":
// Show file details
case "/":
// Focus search box
}
case filesLoadedMsg:
m.files = msg.files
}
return m, nil
}
func (m model) View() string {
// Render UI (see view.go implementation)
return renderUI(m)
}
func main() {
p := tea.NewProgram(initialModel(), tea.WithAltScreen())
if err := p.Start(); err != nil {
fmt.Printf("Error: %v", err)
os.Exit(1)
}
}Database Queries (database/catalog.go):
package database
import (
"database/sql"
)
func QueryFilesByCategoryAndTags(db *sql.DB, category string, tags []string) ([]File, error) {
query := `
SELECT DISTINCT fc.file_id, fc.name, fc.description, fc.path, fc.category, fc.version, fc.modified
FROM file_catalog fc
LEFT JOIN file_tags ft ON fc.file_id = ft.file_id
WHERE 1=1
`
args := []interface{}{}
if category != "" {
query += " AND fc.category = ?"
args = append(args, category)
}
if len(tags) > 0 {
query += " AND ft.tag IN ("
for i, tag := range tags {
if i > 0 {
query += ","
}
query += "?"
args = append(args, tag)
}
query += ")"
}
query += " ORDER BY fc.modified DESC"
rows, err := db.Query(query, args...)
if err != nil {
return nil, err
}
defer rows.Close()
var files []File
for rows.Next() {
var f File
err := rows.Scan(&f.FileID, &f.Name, &f.Description, &f.Path, &f.Category, &f.Version, &f.Modified)
if err != nil {
return nil, err
}
files = append(files, f)
}
return files, nil
}(See "Registry System Architecture" section above for complete schema)
Summary Tables:
- file_catalog: Core metadata for all cataloged files
- file_tags: Tag index (many-to-many)
- agent_registry: Agent metadata and activity tracking
- file_dependencies: Relationships between files
Sample Data:
-- Insert file
INSERT INTO file_catalog VALUES (
'SOM-SCR-0014-v1.0.0',
'cli.py',
'ghost_shell/cli.py',
'Ghost_Shell unified management CLI',
'GHOST-SHELL',
'script',
'2025-11-23',
'2025-11-23',
'1.0.0',
'AGENT-CLAUDE-002',
'python -m ghost_shell.cli [command] [args]',
'3a5f7c9d...',
'2025-11-24 10:30:00'
);
-- Insert tags
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'cli');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'management');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'admin');
INSERT INTO file_tags VALUES ('SOM-SCR-0014-v1.0.0', 'opentelemetry');
-- Insert agent
INSERT INTO agent_registry VALUES (
'AGENT-CLAUDE-002',
'claude_takeover',
'claude-sonnet-4-5-20250929',
'2025-11-23 00:00:00',
'2025-11-24 10:30:00'
);
-- Insert dependency (cli.py depends on db_handler.py)
INSERT INTO file_dependencies VALUES (
'SOM-SCR-0014-v1.0.0',
'SOM-SCR-XXXX-vX.X.X', -- db_handler.py file_id
'import'
);ghost-catalog <command> [options]
Commands:
list List all cataloged files
search Search files by criteria
info Show detailed information about a file
validate Validate file headers
generate Generate new file with catalog header
sync Synchronize catalog database with file system
stats Show catalog statistics
export Export catalog to JSON/CSV
tags Manage tags
agents List agents in registry
Global Options:
--db PATH Path to catalog database (default: ./data/catalog.db)
--verbose Show detailed output
--help Show help
ghost-catalog list
ghost-catalog list [options]
Options:
--category CATEGORY Filter by category (script, documentation, etc.)
--tag TAG Filter by tag (can be specified multiple times)
--project PROJECT Filter by project ID
--agent AGENT Filter by agent ID
--sort FIELD Sort by field (modified, created, file_id)
--format FORMAT Output format (table, json, csv)
Examples:
ghost-catalog list --category script
ghost-catalog list --tag opentelemetry --tag cli
ghost-catalog list --project GHOST-SHELL --sort modifiedghost-catalog search
ghost-catalog search QUERY [options]
Arguments:
QUERY Search query (searches file_id, name, description)
Options:
--category CATEGORY Filter by category
--tag TAG Filter by tag
--fuzzy Enable fuzzy matching
Examples:
ghost-catalog search "proxy"
ghost-catalog search "telemetry" --tag opentelemetry
ghost-catalog search "SOM-SCR-001" --fuzzyghost-catalog info
ghost-catalog info FILE_ID
Arguments:
FILE_ID File ID to show details for
Examples:
ghost-catalog info SOM-SCR-0014-v1.0.0ghost-catalog validate
ghost-catalog validate [options]
Options:
--fix Auto-fix minor issues (sync version, update modified date)
--file FILE Validate specific file
--strict Enable strict validation (fail on warnings)
Examples:
ghost-catalog validate
ghost-catalog validate --fix
ghost-catalog validate --file ghost_shell/cli.py --strictghost-catalog generate
ghost-catalog generate --file FILE --category CATEGORY [options]
Options:
--file FILE File to create/update
--category CATEGORY Category code (script, documentation, etc.)
--description DESC File description
--tags TAG1,TAG2 Comma-separated tags
--project PROJECT Project ID (default: current project)
--agent AGENT Agent ID (default: current agent)
Examples:
ghost-catalog generate --file new_module.py --category script --description "New feature module" --tags cli,adminghost-catalog sync
ghost-catalog sync [options]
Options:
--path PATH Path to scan (default: current directory)
--recursive Scan recursively (default: true)
--dry-run Show what would be updated without making changes
Examples:
ghost-catalog sync
ghost-catalog sync --path ./ghost_shell --dry-runghost-catalog stats
ghost-catalog stats
Examples:
ghost-catalog statsOutput:
╭─────────────────────────────────────╮
│ Ghost_Shell Catalog Stats │
├─────────────────────────────────────┤
│ Total Files: 25 │
│ Scripts: 6 │
│ Documentation: 4 │
│ Configuration: 2 │
│ Tests: 1 │
│ │
│ Total Tags: 42 │
│ Most Used Tag: opentelemetry (8 files) │
│ │
│ Total Agents: 2 │
│ Most Active Agent: AGENT-CLAUDE-002 (20 files) │
│ │
│ Last Sync: 2025-11-24 10:30:00 │
│ Database Size: 245 KB │
╰─────────────────────────────────────╯
ghost-catalog export
ghost-catalog export OUTPUT [options]
Arguments:
OUTPUT Output file path
Options:
--format FORMAT Export format (json, csv, yaml)
--category CATEGORY Filter by category
--tag TAG Filter by tag
Examples:
ghost-catalog export catalog.json
ghost-catalog export catalog.csv --format csv --category script#!/usr/bin/env python3
"""
Extract all file IDs from a codebase using header parsing.
"""
import re
from pathlib import Path
from typing import Dict, List
def extract_file_id_from_content(content: str) -> Dict[str, str]:
"""Parse file header and extract all metadata fields."""
metadata = {}
# Patterns for different header styles
patterns = {
'file_id': r'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)',
'name': r'name:\s+(.+)',
'description': r'description:\s+(.+)',
'category': r'category:\s+(\w+)',
'version': r'version:\s+([\d.]+)',
'agent_id': r'agent_id:\s+(AGENT-[A-Z]+-\d{3})',
'created': r'created:\s+(\d{4}-\d{2}-\d{2})',
'modified': r'modified:\s+(\d{4}-\d{2}-\d{2})',
}
for key, pattern in patterns.items():
match = re.search(pattern, content)
if match:
metadata[key] = match.group(1).strip()
# Extract tags (array format)
tags_match = re.search(r'tags:\s+\[([^\]]+)\]', content)
if tags_match:
tags_str = tags_match.group(1)
metadata['tags'] = [tag.strip() for tag in tags_str.split(',')]
return metadata
def scan_directory(root_path: Path, extensions: List[str]) -> List[Dict]:
"""Scan directory for files with catalog headers."""
results = []
for ext in extensions:
for filepath in root_path.rglob(f'*{ext}'):
# Skip virtual environments and cache directories
if any(part in str(filepath) for part in ['.venv', '__pycache__', 'node_modules']):
continue
try:
with open(filepath, 'r', encoding='utf-8') as f:
# Read first 30 lines (headers should be within this range)
header_content = ''.join(f.readlines()[:30])
metadata = extract_file_id_from_content(header_content)
if metadata.get('file_id'):
metadata['path'] = str(filepath.relative_to(root_path))
results.append(metadata)
except Exception as e:
print(f"Error reading {filepath}: {e}")
return results
if __name__ == '__main__':
import json
# Scan current directory
root = Path('.')
extensions = ['.py', '.md', '.yaml', '.yml', '.ps1']
files = scan_directory(root, extensions)
print(f"Found {len(files)} cataloged files\n")
# Group by category
by_category = {}
for file in files:
category = file.get('category', 'unknown')
if category not in by_category:
by_category[category] = []
by_category[category].append(file)
# Print summary
for category, items in sorted(by_category.items()):
print(f"\n{category.upper()} ({len(items)} files):")
for item in items:
print(f" {item['file_id']:25} {item.get('description', 'No description')[:50]}")
# Export to JSON
with open('catalog_export.json', 'w') as f:
json.dump(files, f, indent=2)
print(f"\n✓ Catalog exported to catalog_export.json")# validate_headers.ps1
# Validate all file catalog headers for consistency
param(
[string]$Path = ".",
[switch]$Fix
)
function Test-FileHeader {
param([string]$FilePath)
$content = Get-Content $FilePath -TotalCount 20 -Raw
$errors = @()
# Extract metadata
if ($content -match 'file_id:\s+(SOM-[A-Z]{3}-\d{4}-v[\d.]+)') {
$fileId = $Matches[1]
} else {
$errors += "Missing or invalid file_id"
return @{ Valid = $false; Errors = $errors }
}
if ($content -match 'version:\s+([\d.]+)') {
$version = $Matches[1]
} else {
$errors += "Missing version field"
}
# Check file_id version matches version field
if ($fileId -match 'v([\d.]+)' -and $version) {
$idVersion = $Matches[1]
if ($idVersion -ne $version) {
$errors += "Version mismatch: file_id has v$idVersion but version field has $version"
}
}
# Check filename matches
$fileName = Split-Path $FilePath -Leaf
if ($content -match 'name:\s+(.+)') {
$headerName = $Matches[1].Trim()
if ($fileName -ne $headerName) {
$errors += "Filename mismatch: file is '$fileName' but header says '$headerName'"
}
}
# Check required fields
$requiredFields = @('file_id', 'name', 'description', 'category', 'created', 'modified', 'version')
foreach ($field in $requiredFields) {
if ($content -notmatch "$field\s*:") {
$errors += "Missing required field: $field"
}
}
return @{
Valid = ($errors.Count -eq 0)
FileId = $fileId
Errors = $errors
}
}
# Scan all Python and Markdown files
$files = Get-ChildItem -Path $Path -Recurse -Include *.py,*.md |
Where-Object { $_.FullName -notmatch '(\.venv|__pycache__|node_modules)' }
$totalFiles = 0
$validFiles = 0
$invalidFiles = @()
foreach ($file in $files) {
$totalFiles++
$result = Test-FileHeader -FilePath $file.FullName
if ($result.Valid) {
$validFiles++
Write-Host "✓" -ForegroundColor Green -NoNewline
Write-Host " $($result.FileId) - $($file.Name)"
} else {
Write-Host "✗" -ForegroundColor Red -NoNewline
Write-Host " $($file.Name)"
foreach ($error in $result.Errors) {
Write-Host " └─ $error" -ForegroundColor Yellow
}
$invalidFiles += $file
}
}
Write-Host "`nValidation Summary:" -ForegroundColor Cyan
Write-Host "Total Files: $totalFiles"
Write-Host "Valid: $validFiles" -ForegroundColor Green
Write-Host "Invalid: $($invalidFiles.Count)" -ForegroundColor Red
if ($invalidFiles.Count -gt 0 -and $Fix) {
Write-Host "`nAttempting auto-fix..." -ForegroundColor Yellow
# Auto-fix logic here (update modified dates, sync versions, etc.)
}-- analytics.sql
-- Advanced queries for catalog insights
-- 1. File activity timeline
SELECT
date(modified) as day,
category,
COUNT(*) as files_modified
FROM file_catalog
WHERE modified >= date('now', '-30 days')
GROUP BY day, category
ORDER BY day DESC, files_modified DESC;
-- 2. Agent productivity
SELECT
ar.name as agent,
COUNT(DISTINCT fc.file_id) as total_files,
COUNT(DISTINCT CASE WHEN fc.modified >= date('now', '-7 days') THEN fc.file_id END) as recent_edits,
GROUP_CONCAT(DISTINCT fc.category) as categories_worked
FROM file_catalog fc
JOIN agent_registry ar ON fc.agent_id = ar.id
GROUP BY ar.name
ORDER BY total_files DESC;
-- 3. Tag co-occurrence matrix (find related tags)
SELECT
t1.tag as tag1,
t2.tag as tag2,
COUNT(*) as co_occurrences
FROM file_tags t1
JOIN file_tags t2 ON t1.file_id = t2.file_id AND t1.tag < t2.tag
GROUP BY t1.tag, t2.tag
HAVING co_occurrences > 1
ORDER BY co_occurrences DESC
LIMIT 20;
-- 4. Version distribution
SELECT
CAST(substr(version, 1, 1) AS INTEGER) as major_version,
COUNT(*) as file_count
FROM file_catalog
GROUP BY major_version
ORDER BY major_version;
-- 5. Stale files (not modified in 90 days)
SELECT
file_id,
name,
category,
modified,
julianday('now') - julianday(modified) as days_since_modified
FROM file_catalog
WHERE days_since_modified > 90
ORDER BY days_since_modified DESC;
-- 6. File dependency depth (how many layers of dependencies)
WITH RECURSIVE dep_tree AS (
-- Base case: top-level files (no one depends on them)
SELECT file_id, 0 as depth
FROM file_catalog
WHERE file_id NOT IN (SELECT depends_on_file_id FROM file_dependencies)
UNION ALL
-- Recursive case: files that depend on previous level
SELECT fd.file_id, dt.depth + 1
FROM file_dependencies fd
JOIN dep_tree dt ON fd.depends_on_file_id = dt.file_id
)
SELECT
fc.file_id,
fc.name,
MAX(dt.depth) as max_depth
FROM dep_tree dt
JOIN file_catalog fc ON dt.file_id = fc.file_id
GROUP BY fc.file_id, fc.name
ORDER BY max_depth DESC;
-- 7. Category growth over time
SELECT
substr(created, 1, 7) as month,
category,
COUNT(*) as new_files
FROM file_catalog
GROUP BY month, category
ORDER BY month DESC, new_files DESC;Scenario: A new AI agent (AGENT-GPT-001) takes over the Ghost_Shell project and needs to understand the codebase.
Workflow:
sequenceDiagram
participant Agent as New Agent (GPT-001)
participant CLI as ghost-catalog CLI
participant TUI as Bubble Tea Browser
participant FS as File System
Agent->>CLI: ghost-catalog stats
CLI-->>Agent: Show project overview (25 files, 6 scripts, 4 docs)
Agent->>TUI: Launch TUI browser
TUI->>FS: Load catalog database
FS-->>TUI: Return file list
TUI-->>Agent: Display categorized file tree
Agent->>TUI: Filter by category: "documentation"
TUI-->>Agent: Show 4 docs (README, CODEBASE_OVERVIEW, etc.)
Agent->>TUI: Select "CODEBASE_OVERVIEW.md"
TUI-->>Agent: Show file details:
Note over TUI: file_id: SOM-DOC-0003-v1.0.0
Note over TUI: description: Complete codebase overview
Note over TUI: tags: [handoff, architecture, overview]
Agent->>TUI: Press 'o' to open in editor
TUI->>FS: Open file in $EDITOR
Agent->>CLI: ghost-catalog search "proxy"
CLI-->>Agent: Found 3 files with "proxy" in description
Agent->>CLI: ghost-catalog info SOM-SCR-0012-v1.1.0
CLI-->>Agent: Show full metadata for proxy/core.py
Agent->>Agent: Now understands project structure
Outcome: Agent quickly orients itself using catalog metadata instead of reading every file.
Scenario: Agent needs to understand which files depend on db_handler.py before refactoring it.
Workflow:
# Step 1: Find db_handler file ID
$ ghost-catalog search "db_handler"
SOM-SCR-XXXX-v1.0.0 ghost_shell/data/db_handler.py Unified database handler
# Step 2: Query dependencies (reverse lookup)
$ sqlite3 data/catalog.db
sqlite> SELECT fc.file_id, fc.name
FROM file_dependencies fd
JOIN file_catalog fc ON fd.file_id = fc.file_id
WHERE fd.depends_on_file_id = 'SOM-SCR-XXXX-v1.0.0';
# Results:
# SOM-SCR-0012-v1.1.0 core.py
# SOM-SCR-0013-v1.0.0 collector.py
# SOM-SCR-0014-v1.0.0 cli.py
# Step 3: View dependency graph in TUI
$ ghost-catalog-tui
# Press 'd' on db_handler.py → shows visual dependency treeOutcome: Agent knows exactly which files will be affected by changes, reducing refactoring risk.
Scenario: Developer wants to find outdated documentation files that haven't been updated in 6 months.
Workflow:
-- Query catalog database
SELECT
file_id,
name,
description,
modified,
julianday('now') - julianday(modified) as days_old
FROM file_catalog
WHERE category = 'documentation'
AND days_old > 180
ORDER BY days_old DESC;
-- Results:
-- SOM-DOC-0001-v1.0.0 OLD_SETUP.md Installation guide 2024-03-15 245 days# Using CLI
$ ghost-catalog list --category documentation --sort modified
# Output shows oldest docs at bottom
# Developer can then update or archive themOutcome: Easy identification of documentation needing updates.
Scenario: New developer wants to find all files related to OpenTelemetry instrumentation.
Workflow:
# Using CLI
$ ghost-catalog search --tag opentelemetry
# Results:
# SOM-SCR-0010-v1.0.0 telemetry.py OpenTelemetry setup
# SOM-SCR-0012-v1.1.0 core.py Main proxy addon
# SOM-SCR-0013-v1.0.0 collector.py Intelligence collector
# SOM-SCR-0014-v1.0.0 cli.py Management CLI
# Open all in editor
$ ghost-catalog search --tag opentelemetry --format json | jq -r '.[].path' | xargs codeOutcome: Developer instantly finds all relevant files without grep-ing or manual exploration.
Scenario: QA team needs to verify all files are at v1.0.0 or higher before release.
Workflow:
# Validate version compliance
$ ghost-catalog validate --strict
# Output:
# ✗ SOM-SCR-0008-v0.9.5 old_module.py
# └─ Version below 1.0.0 (v0.9.5)
# ✗ SOM-DOC-0002-v0.5.0 DRAFT_SPEC.md
# └─ Version below 1.0.0 (v0.5.0)
# Fix by updating files and bumping versions
$ ghost-catalog generate --file ghost_shell/old_module.py --category script --version 1.0.0Outcome: Automated version compliance checking for release readiness.
Step 1: Audit Current Files
# List all Python/Markdown files without headers
find . -name "*.py" -o -name "*.md" | while read file; do
if ! grep -q "file_id:" "$file"; then
echo "$file"
fi
done > files_without_headers.txtStep 2: Generate Headers
# bulk_add_headers.py
import sys
from pathlib import Path
def generate_header(filepath, category, description):
"""Generate catalog header for a file."""
# Determine next sequence number
# ... (use logic from "UUID Generation Workflow")
# Read existing content
with open(filepath, 'r') as f:
existing_content = f.read()
# Generate header
header = f"""# ==============================================================================
# file_id: {file_id}
# name: {filepath.name}
# description: {description}
# project_id: {project_id}
# category: {category}
# tags: []
# created: {today}
# modified: {today}
# version: 1.0.0
# agent_id: {agent_id}
# execution: python {filepath}
# ==============================================================================
"""
# Write with header
with open(filepath, 'w') as f:
f.write(header + existing_content)
# Process all files
for filepath in Path('.').rglob('*.py'):
if not has_header(filepath):
generate_header(filepath, 'script', 'TODO: Add description')Step 3: Build Catalog Database
# Create and populate catalog
ghost-catalog sync --path .Step 4: Validate
ghost-catalog validate --fixPre-Commit Hook:
#!/bin/bash
# .git/hooks/pre-commit
# Validate catalog headers before commit
echo "Validating catalog headers..."
# Run validation
ghost-catalog validate --strict
if [ $? -ne 0 ]; then
echo "❌ Catalog validation failed. Fix errors before committing."
exit 1
fi
echo "✓ Catalog validation passed"
exit 0Post-Merge Hook:
#!/bin/bash
# .git/hooks/post-merge
# Re-sync catalog after merges
echo "Re-syncing catalog database..."
ghost-catalog sync
echo "✓ Catalog synced"GitHub Actions (.github/workflows/catalog-check.yml):
name: Catalog Validation
on:
pull_request:
paths:
- '**/*.py'
- '**/*.md'
- '**/*.yaml'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install ghost-catalog
run: |
# Install CLI tool
go install github.com/somacosf/ghost-catalog@latest
- name: Validate catalog headers
run: |
ghost-catalog validate --strict
- name: Check for missing headers
run: |
ghost-catalog sync --dry-run
if [ -n "$(ghost-catalog sync --dry-run | grep 'Missing header')" ]; then
echo "❌ Some files are missing catalog headers"
exit 1
fiGoal: Automatically detect imports and build dependency graph.
Implementation:
- Parse Python
importstatements - Parse Markdown file links
- Build
file_dependenciestable
Example:
# auto_deps.py
import ast
def extract_imports(filepath):
with open(filepath) as f:
tree = ast.parse(f.read())
imports = []
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
imports.append(alias.name)
elif isinstance(node, ast.ImportFrom):
imports.append(node.module)
return imports
# Map module names to file_ids
# Insert into file_dependencies tableGoal: Use LLM to suggest tags based on file content.
Workflow:
sequenceDiagram
participant CLI as ghost-catalog
participant LLM as Claude API
participant DB as Catalog DB
CLI->>DB: Get files without tags
DB-->>CLI: Return file list
loop For each file
CLI->>LLM: Analyze file content, suggest tags
LLM-->>CLI: Return suggested tags: [cli, admin, sqlite]
CLI->>User: Show suggestions, confirm?
User-->>CLI: Approve tags
CLI->>DB: Update file_tags table
end
Goal: Generate visual diagrams of file relationships.
Tools: Graphviz, Mermaid, or D3.js
Example Output:
graph LR
A[cli.py] --> B[db_handler.py]
C[core.py] --> B
D[collector.py] --> B
E[launcher.py] --> A
E --> C
Goal: Predict which files will be affected by a change.
Query:
-- Find all files that transitively depend on db_handler.py
WITH RECURSIVE impact AS (
SELECT file_id, 1 as depth
FROM file_catalog
WHERE file_id = 'SOM-SCR-XXXX-v1.0.0'
UNION ALL
SELECT fd.file_id, impact.depth + 1
FROM file_dependencies fd
JOIN impact ON fd.depends_on_file_id = impact.file_id
WHERE impact.depth < 10
)
SELECT DISTINCT fc.file_id, fc.name
FROM impact
JOIN file_catalog fc ON impact.file_id = fc.file_id;Goal: Single catalog database for all Somacosf projects.
Schema Extension:
CREATE TABLE projects (
project_id TEXT PRIMARY KEY,
name TEXT,
path TEXT,
description TEXT
);
ALTER TABLE file_catalog ADD COLUMN workspace_id TEXT;Benefits:
- Search across all projects
- Track agent activity workspace-wide
- Identify duplicate/similar files
The Ghost_Shell File ID System provides a semantic, self-documenting catalog framework that enables:
- Rapid Onboarding: New agents can understand codebases via metadata
- Efficient Search: Find files by category, tags, or relationships
- Version Tracking: Built-in semver for all files
- Agent Coordination: Track which agent created/modified each file
- Future-Ready: Designed for database integration and advanced tooling
Next Steps:
- Implement
ghost-catalogCLI tool (Go or Python) - Build Bubble Tea TUI for interactive browsing
- Create catalog database and sync tool
- Integrate with development workflows (Git hooks, CI/CD)
Key Takeaway: Unlike UUIDs (which prioritize uniqueness), SOM File IDs prioritize discoverability and semantic richness, making them ideal for AI-agent-driven development workspaces.
Document Version: 1.0.0 Created: 2025-01-24 Agent: Claude (Sonnet 4.5) Project: Ghost_Shell / Somacosf