SigMap Codebase Report

Generated: 2026-05-18
Version: 6.10.10
Branch: main (up-to-date with origin)

Repository Status

Item	Value
Version	6.10.10
Latest Commit	270e21c (Merge PR #200 from develop)
Git Status	Clean - up to date
Benchmark ID	sigmap-v6.10-main
Benchmark Date	2026-05-12
Test Count	722

Architecture

SigMap is an AI context engine that extracts function/class signatures and ranks them for relevance — without using any LLM.

Ask → Rank → Context → Validate → Judge → Learn

Directory Structure

sigmap/
├── gen-context.js       # Bundled CLI (single file, zero deps)
├── src/
│   ├── extractors/      # 41 language extractors
│   ├── retrieval/
│   │   ├── ranker.js     # TF-IDF ranking engine
│   │   └── tokenizer.js  # Code-aware tokenization
│   ├── config/
│   │   ├── defaults.js   # Default config values
│   │   └── loader.js     # Config file parser
│   ├── learning/         # weights.js - learned rankings
│   ├── graph/           # builder.js, impact.js
│   ├── mcp/             # server.js, handlers.js
│   └── health/          # scorer.js
└── packages/
    ├── core/             # Programmatic API
    └── adapters/         # 10 output format adapters

Signature Extraction

Each language has its own extractor in /src/extractors/:

Language	Approach
TypeScript/JS	Regex parsing (classes, interfaces, functions)
Python	Regex + Python AST for complex cases
Go, Rust, Java	Regex-based extractors
R	Roxygen comment parsing

Example transformation:

// Full code (50+ lines)
class UserService {
  async getUser(id: string): Promise<User> { ... }
}

// Signature (1 line)
class UserService
  getUser(id)

TF-IDF Ranking

src/retrieval/ranker.js scores each file using weighted signals:

Signal	Weight	Description
exactToken	1.0	Query token in signature
symbolMatch	0.5	Token in function/class name
prefixMatch	0.3	Token is prefix of sig token
pathMatch	0.8	Token in file path
recencyBoost	1.5×	Recently changed files
graphBoost	0.4	Imports a scored file

Penalty Signals:

Signal	Multiplier	Pattern
testFile	0.4	test/, spec/
generatedCode	0.3	dist/, build/
docsFile	0.2	docs/, readme/
node_modules	0.0	Always zero

Graph Boost (2-hop with decay):

Hop 1: +0.40 for direct imports
Hop 2: +0.15 for transitive imports

Intent Detection (7 types)

SigMap detects query type and adjusts weights:

Intent	exactToken	symbolMatch	pathMatch	graphBoost
debug	1.2	-	0.6	-
explain	-	0.8	0.9	-
refactor	0.8	0.9	-	-
review	0.9	-	1.0	-
test	0.7	0.4	-	-
integrate	-	-	1.1	0.7
navigate	0.9	-	1.2	-

Output Adapters (10 total)

Adapter	Output File	Used By
copilot	`.github/copilot-instructions.md`	GitHub Copilot
claude	`CLAUDE.md`	Claude Code
cursor	`.cursorrules`	Cursor, Cline
windsurf	`.windsurfrules`	Windsurf
openai	`.github/openai-context.md`	Ollama, Aider
gemini	`.github/gemini-context.md`	Google Gemini
codex	`AGENTS.md`	OpenAI Codex
willow	Willow MCP	Willow
llm-full	`llm-full.txt`	Full context
llm	`llm.txt`	Compact context

MCP Tools (9 total)

Tool	Purpose
`read_context`	Read context file (full or per-module)
`search_signatures`	Search signatures by keyword
`get_map`	Import graph, class hierarchy, routes
`create_checkpoint`	Session snapshot with branch/commit
`get_routing`	Model tier hints for files
`explain_file`	Deep-dive with imports/callers
`list_modules`	Module directory listing
`query_context`	TF-IDF ranked retrieval
`get_impact`	Dependency blast radius

Benchmark Results (v6.10-main)

Metric	Value	Baseline
Hit@5	78.9%	13.6% (5.8× lift)
Token reduction	97.9%	278K vs 13.5M tokens
Task success	52.2%	10%
Prompts per task	1.66	2.84 (40.6% fewer)
Repositories tested	21	JS, Python, Go, Rust, Java, R, etc.

Tokenizer

src/retrieval/tokenizer.js provides code-aware tokenization:

camelCase → camel case
snake_case → snake case
kebab-case → kebab case
File paths → individual components
Stop words removed (the, a, an, in, of, etc.)

Configuration Defaults

{
  output: '.github/copilot-instructions.md',
  adapters: null,
  srcDirs: ['src', 'app', 'lib', 'packages', ...],
  exclude: ['node_modules', '.git', 'dist', ...],
  maxDepth: 6,
  maxSigsPerFile: 25,
  maxTokens: 6000,
  autoMaxTokens: true,
  coverageTarget: 0.80,
  secretScan: true,
  strategy: 'full',  // 'full' | 'per-module' | 'hot-cold'
  sigCache: false,
  retrieval: { topK: 10, recencyBoost: 1.5 },
  impact: { depth: 3, includeSigs: true }
}

Supported Languages (41 total)

TypeScript, JavaScript, Python, Java, Kotlin, Go, Rust, C#, C/C++, Ruby, PHP, Swift, Dart, Scala, Vue, Svelte, HTML, CSS/SCSS, YAML, Shell, SQL, GraphQL, Terraform, Protobuf, Dockerfile, TOML, XML, Properties, Markdown, R, GDScript

Key Files

File	Purpose
`gen-context.js`	Bundled CLI (single file, zero deps)
`packages/core/index.js`	Programmatic API
`src/retrieval/ranker.js`	TF-IDF ranking
`src/retrieval/tokenizer.js`	Code tokenization
`src/config/loader.js`	Config parsing + auto-detection
`src/graph/builder.js`	Dependency graph builder
`src/graph/impact.js`	Impact analysis
`src/learning/weights.js`	Learned ranking multipliers
`src/mcp/server.js`	MCP stdio server
`src/health/scorer.js`	Health scoring
`src/judge/judge-engine.js`	Groundedness scoring

Recent Changes (v6.10.x)

v6.10.10 (2026-05-12)

First-class R support (R6/S7 classes, Roxygen2 hints)
DESCRIPTION/NAMESPACE parsing for R packages
Windows path normalization fix for get_impact

v6.10.0 - v6.10.9

Python absolute imports in builder.js
Comprehensive import graph diagnostics
Workspace-scoped retrieval for monorepos

Installation

# Try without installing
npx sigmap

# Install globally
npm install -g sigmap

# Standalone binary (no Node.js required)
# Download from GitHub Releases

Quick Commands

sigmap              # Generate context
sigmap ask "auth"   # Ask a question
sigmap validate     # Check coverage
sigmap --health     # Health score
sigmap --mcp        # Start MCP server
sigmap --setup      # IDE integration

Report generated by analyzing the SigMap codebase directly. No LLM used.

Hop 2: +0.15 for transitive imports

mdp/REPORT.md

Select an option

No results found