Skip to content

Instantly share code, notes, and snippets.

@thomasleveil
Created March 15, 2026 00:55
Show Gist options
  • Select an option

  • Save thomasleveil/cc8625ac71d8d9dc972b13fc165912c1 to your computer and use it in GitHub Desktop.

Select an option

Save thomasleveil/cc8625ac71d8d9dc972b13fc165912c1 to your computer and use it in GitHub Desktop.
Sourcegraph vs Sourcebot — Query Syntax Comparison

Sourcegraph vs Sourcebot — Query Syntax Comparison

Predicates (with parentheses)

Predicate Sourcegraph Sourcebot Notes
repo:has.path(...) Repos containing matching file paths
repo:has.meta(...) Filter by repo metadata (key-value)
repo:has.topic(...) Filter by GitHub/GitLab topic
repo:has.commit.after(...) Exclude stale repos
rev:at.time(...) Search repo at a point in time
file:has.content(...) Files matching a regex pattern
file:has.owners(...) Files owned by specific owners
file:has.contributor(...) Files by contributor name/email

Sourcebot has no predicates. Zero. The entire predicate system is Sourcegraph-only.


Field Filters

Filter Sourcegraph Sourcebot Notes
repo: / -repo: Both support negation
file: / -file: Both support negation
lang: / -lang: Aliases differ (Sourcegraph: language:, l:; Sourcebot: lang:)
rev: Both support branch/tag targeting
content: / -content: Explicit content filter
type: symbol, path, file, diff, commit
select: Result type projection
case: Case-sensitive matching
fork: Include/filter forked repos
archived: Include/filter archived repos
visibility: Public/private filtering
count: Result limit
timeout: Search timeout
patterntype: keyword vs regexp mode
author: / -author: Commit author (diff/commit search)
committer: Committer filter
before: / after: Commit date range
message: / -message: Commit message filter
sym: / -sym: Symbol search via universal ctags
context: / -context: Predefined search context

Operators

Operator Sourcegraph Sourcebot
AND / space
OR / or
NOT / -prefix
() grouping
"" phrase
Regex (.*)

Summary

Category Sourcegraph Sourcebot
Predicates 8 0
Field filters ~20 6
Operators 6 6
Unique to Sourcebot sym:, context:
Unique to Sourcegraph All predicates + type:, select:, case:, commit filters, visibility:, fork:, archived:, patterntype:

Sourcebot covers the basic search surface (repo/file/lang/rev) but has no predicate system and no commit/diff search. Sourcegraph is dramatically richer — predicates alone are a major feature gap.


Sources: Sourcegraph docs · Sourcebot docs

@thomasleveil

Copy link
Copy Markdown
Author

@grok are these claims exact ?

The claims in the Gist (created March 15, 2026) regarding query syntax comparison between Sourcegraph and Sourcebot remain accurate as of the current date, based on official documentation. Sourcegraph's documentation (last updated implicitly around early 2026, with features up to version 5.4) confirms support for approximately 10 predicates (e.g., repo:has.path(...), file:has.content(...)), over 20 field filters (e.g., repo:, content:, type:, select:, commit-related like author:), and 6 operators (AND, OR, NOT, -, (), for regex). Sourcebot's documentation (published March 2025, with content up to March 2026) verifies no predicates, 6 field filters (repo:, file:, lang:, rev:, sym:, context:), and the same 6 operators, with regex and keyword modes but lacking advanced commit or metadata filtering. No substantive updates alter the Gist's assessment within this syntax-focused scope.

@thomasleveil

Copy link
Copy Markdown
Author

take a look at sourcebot source code, how difficult would it be to implement the select:repo filter ?

  Difficulty: Medium — ~4-6 hours of work.

  Here's the breakdown:

  ---
  Architecture

  Sourcebot is full TypeScript (Next.js frontend + Node.js backend), using Zoekt (Google's code search engine) over gRPC as the actual search backend. The query pipeline has 3 clear layers:

  1. packages/queryLanguage/ — Lezer grammar + parser (separate npm package)
  2. packages/web/src/features/search/ — IR (intermediate representation) + Zoekt client
  3. API routes — /api/search and /api/stream_search

  ---
  Where select:repo would need to go

  ┌─────────────────────────┬──────────────────────────────────────────────────────────────────┬─────────────────────────────────────────┬──────────────┐
  │          Step           │                               File                               │                 Change                  │     Size     │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 1. Add token            │ packages/queryLanguage/src/tokens.ts                             │ Append "select:" to PREFIXES array      │ 1 line       │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 2. Update grammar       │ packages/queryLanguage/src/query.grammar                         │ Add SelectExpr production rule          │ ~5-10 lines  │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 3. IR type              │ packages/web/src/features/search/ir.ts                           │ Add { select: string } to QueryIR union │ ~3 lines     │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 4. Parser transform     │ packages/web/src/features/search/parser.ts → transformTreeToIR() │ Add case for SelectExpr                 │ ~10-15 lines │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 5. Result deduplication │ packages/web/src/features/search/zoektSearcher.ts                │ Post-process: keep first file per repo  │ ~15-20 lines │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 6. Syntax docs          │ packages/queryLanguage/src/syntaxDescription.ts                  │ Add entry                               │ ~3 lines     │
  └─────────────────────────┴──────────────────────────────────────────────────────────────────┴─────────────────────────────────────────┴──────────────┘

  The deduplication itself is trivial:
  const seen = new Set<string>()
  response.files = response.files.filter(f => {
    if (seen.has(f.repository.name)) return false
    seen.add(f.repository.name)
    return true
  })

  ---
  Why not "Easy"

  1. Lezer parser regeneration — parser.ts is auto-generated from query.grammar. You can't edit it directly; you need to run lezer-generator (build step not documented). If the grammar change breaks the generator, debugging the binary state machine is painful.
  2. Streaming complicates deduplication — /api/stream_search sends SSE chunks progressively. Deduplication needs per-chunk state tracking or a final-pass buffer.
  3. Stats become inaccurate — Zoekt returns fileCount based on pre-filter results; the count displayed to the user will be wrong without recalculating post-dedup.
  4. No test infrastructure visible — No test files found for the search pipeline; harder to validate safely.

  ---
  Bottom line

  The filtering logic itself is ~50 lines of TypeScript. The risk is in the Lezer grammar build pipeline and the streaming edge case. If you can confirm the lezer-generator build step works cleanly, this is a solid afternoon of work. If the grammar toolchain is opaque, add a buffer day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment