Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save simianhacker/9e84b2a119a5de40919bb7b76e97da35 to your computer and use it in GitHub Desktop.

Select an option

Save simianhacker/9e84b2a119a5de40919bb7b76e97da35 to your computer and use it in GitHub Desktop.
Example spec: Filename-Based Language Detection for Semantic Code Search — used in the blog post 'Research, Plan, Ship'
status in-progress
worktree filename-based-language-detection
project semantic-code-search
tags
spec
created 2026-05-08

Filename-Based Language Detection

Context

  • Problem: Router.getHandler() in packages/runtime/src/index.ts:56-63 only matches files by extension (filePath.endsWith(ext)). Files without extensions — Dockerfile, Makefile, Jenkinsfile, etc. — are silently skipped during indexing. This blocks issues #9 (Makefile) and #10 (Dockerfile).
  • Scope:
    • In-scope: Add optional filenames field to ParserHandler.match, update router/validation/tree-sitter-helper, create a Dockerfile plugin as the first consumer, tests for all changes
    • Out-of-scope: Makefile plugin (#9), Jenkinsfile, Vagrantfile — trivial follow-ups once this infrastructure lands
  • Constraints:
    • No breaking changes — filenames is optional, existing plugins are unaffected
    • Extension matching takes priority over filename matching (existing behavior preserved)
    • Validation must allow handlers with only filenames (no extensions), only extensions (existing), or both
    • No tree-sitter grammar exists for Dockerfile on npm — use parseByLines helper
  • Repo touchpoints:
    • packages/plugin-api/src/types.tsParserHandler interface
    • packages/plugin-api/src/validation.tsvalidateHandler() / validateHandlers()
    • packages/plugin-api/src/validation.test.ts — validation tests
    • packages/runtime/src/index.tsRouter class
    • packages/runtime/src/router.test.ts — router tests
    • packages/plugin-helpers-tree-sitter/src/index.tsTreeSitterHandlerOptions / createTreeSitterHandler()
    • packages/plugin-helpers-tree-sitter/src/index.test.ts — tree-sitter helper tests
    • packages/plugins/lang-dockerfile/ — new plugin (package.json, tsconfig, vitest config, src/index.ts, src/index.test.ts)
    • packages/default-plugins/src/index.ts — register Dockerfile plugin
    • packages/default-plugins/package.json — add dependency
  • Definition of done:
    • Router.getHandler('path/to/Dockerfile') returns the dockerfile handler
    • Router.getHandler('path/to/Containerfile') returns the dockerfile handler
    • Router.getHandler('path/to/foo.dockerfile') returns the dockerfile handler
    • Router.getHandler('path/to/foo.containerfile') returns the dockerfile handler
    • Existing extension-based routing is unchanged
    • Validation rejects duplicate filenames across handlers
    • Validation allows handlers with only filenames and no extensions
    • All existing tests still pass
    • New tests cover filename matching, conflict warnings, validation, and the Dockerfile plugin

Tasks

  • 1) Add filenames to ParserHandler.match type

    • Change: Add optional filenames?: string[] to the match object in the ParserHandler interface
    • Files: packages/plugin-api/src/types.ts
    • Acceptance: TypeScript compiles. Existing code unaffected since field is optional.
  • 2) Update Router to support filename-based matching

    • Change:
      • Add private filenameOwners = new Map<string, string>() alongside extensionOwners
      • In registerPlugin(), iterate handler.match.filenames (when present) and populate filenameOwners with conflict warnings (same pattern as extensions)
      • In getHandler(), after the extension loop returns null, add a second pass that checks path.basename(filePath) against handler.match.filenames. Import path from 'path'.
    • Files: packages/runtime/src/index.ts
    • Acceptance: router.getHandler('/some/path/Dockerfile') returns the correct handler. Extension matching still takes priority.
  • 3) Add router tests for filename matching

    • Change: Add tests to the existing router test file:
      • Route files by filename (Dockerfile, Containerfile)
      • Return null for unmatched filenames
      • Extension match takes priority over filename match
      • Warn on duplicate filename registration across handlers
      • getHandlers() includes filename-only handlers
    • Files: packages/runtime/src/router.test.ts
    • Acceptance: All new tests pass via npx vitest run in packages/runtime/
  • 4) Update validation to support filenames

    • Change:
      • Relax the "at least one extension required" check to "at least one extension OR filename required"
      • Add filename format validation: each filename must be non-empty, must not contain / or \
      • Add cross-handler duplicate filename detection (same pattern as extension duplicates)
    • Files: packages/plugin-api/src/validation.ts
    • Acceptance: validateHandler() accepts handlers with only filenames, rejects empty match (no extensions AND no filenames), detects duplicate filenames across handlers
  • 5) Add validation tests for filenames

    • Change: Add test cases:
      • Pass when handler has filenames but no extensions
      • Pass when handler has both filenames and extensions
      • Fail when handler has neither filenames nor extensions
      • Fail when filename contains / or \
      • Fail when filename is empty string
      • Detect duplicate filenames across handlers
      • validateHandlers() detects filename conflicts
    • Files: packages/plugin-api/src/validation.test.ts
    • Acceptance: All new tests pass via npx vitest run in packages/plugin-api/
  • 6) Update tree-sitter helper to accept filenames

    • Change:
      • Add optional filenames?: string[] to TreeSitterHandlerOptions
      • In createTreeSitterHandler(), pass filenames through to the match object: match: { extensions: options.extensions, ...(options.filenames && { filenames: options.filenames }) }
      • Make extensions optional in TreeSitterHandlerOptions (default to []) since a handler might only have filenames
    • Files: packages/plugin-helpers-tree-sitter/src/index.ts
    • Acceptance: TypeScript compiles. Existing callers unaffected.
  • 7) Add tree-sitter helper test for filenames passthrough

    • Change: Add a test that creates a handler with filenames: ['Makefile'] and verifies handler.match.filenames contains it. Also test that omitting filenames results in no filenames property on match.
    • Files: packages/plugin-helpers-tree-sitter/src/index.test.ts
    • Acceptance: Tests pass via npx vitest run in packages/plugin-helpers-tree-sitter/
  • 8) Create the Dockerfile plugin

    • Change: Create a new plugin package packages/plugins/lang-dockerfile/ following the lang-text pattern:
      • package.json: name @elastic/scs-plugin-lang-dockerfile, dependencies on @elastic/scs-plugin-api
      • tsconfig.json: extends ../../../tsconfig.base.json
      • tsconfig.build.json: extends ./tsconfig.json, excludes tests
      • vitest.config.ts: standard config matching other plugins
      • src/index.ts: export createDockerfileHandler() returning a ParserHandler with:
        • name: 'dockerfile'
        • match: { extensions: ['.dockerfile', '.containerfile'], filenames: ['Dockerfile', 'Containerfile'] }
        • parse: use parseByLines(context, 'dockerfile', 'code') from @elastic/scs-plugin-api
    • Files: packages/plugins/lang-dockerfile/package.json, packages/plugins/lang-dockerfile/tsconfig.json, packages/plugins/lang-dockerfile/tsconfig.build.json, packages/plugins/lang-dockerfile/vitest.config.ts, packages/plugins/lang-dockerfile/src/index.ts
    • Acceptance: createDockerfileHandler() returns a valid ParserHandler. Handler parses a simple Dockerfile into chunks.
  • 9) Add Dockerfile plugin tests

    • Change: Create test file with:
      • Verify handler name is 'dockerfile'
      • Verify match.extensions contains .dockerfile and .containerfile
      • Verify match.filenames contains Dockerfile and Containerfile
      • Parse a sample Dockerfile (multi-line with FROM, RUN, COPY, CMD) and verify chunks are produced with correct language and metrics
    • Files: packages/plugins/lang-dockerfile/src/index.test.ts
    • Acceptance: Tests pass via npx vitest run in packages/plugins/lang-dockerfile/
  • 10) Register Dockerfile plugin in default-plugins

    • Change:
      • Add import { createDockerfileHandler } from '@elastic/scs-plugin-lang-dockerfile' to the imports
      • Add a new definePlugin({ pluginId: '@elastic/scs-plugin-lang-dockerfile', apiVersion: 1, handlers: [createDockerfileHandler()] }) entry to defaultPlugins
      • Add "@elastic/scs-plugin-lang-dockerfile": "^0.0.0" to package.json dependencies
    • Files: packages/default-plugins/src/index.ts, packages/default-plugins/package.json
    • Acceptance: defaultPlugins includes the dockerfile handler. Build succeeds.
  • 11) Install dependencies and build

    • Change: Run npm install from repo root to link the new package, then run the build to verify everything compiles
    • Files: all
    • Acceptance: npm install succeeds, npm run build succeeds (or the equivalent workspace build command)
  • 12) Code review

    • Change: Use the task tool with subagent_type: "code-review" to run an isolated code review against the branch diff (git diff main...HEAD). Fix any critical findings.
    • Files: all modified files
    • Acceptance: No critical findings remain
  • 13) Run all checks and fix issues

    • Change: Run the repo's test suite, linting, type-checking, and formatting. Fix any failures.
    • Files: all modified files
    • Acceptance: All checks pass

Additional Context

  • The tree-sitter-dockerfile package on npm is a security placeholder (0.0.1-security), not a real grammar. This is why the Dockerfile plugin uses parseByLines instead of tree-sitter.
  • The existing lang-text plugin (packages/plugins/lang-text/src/index.ts) is the closest pattern to follow for the Dockerfile plugin.
  • The parseByLines helper is exported from @elastic/scs-plugin-api and chunks files by line count using defaultChunkLines and chunkOverlapLines from context options.
  • Existing plugins that could benefit from filenames in the future: lang-bash (for files like .bashrc, .bash_profile), a future lang-makefile (#9).
  • The isSharedExtensionAllowed pattern in the router currently only applies to extensions (.h shared between C and C++). There's no equivalent need for shared filenames yet, but the filename conflict detection should follow the same architecture for consistency.
  • Issue reference: GitHub issue #4 (filename detection) and #10 (Dockerfile support).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment