thomasleveil/sourcegraph-vs-sourcebot.md

Created March 15, 2026 00:55

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/thomasleveil/cc8625ac71d8d9dc972b13fc165912c1.js"></script>
Save thomasleveil/cc8625ac71d8d9dc972b13fc165912c1 to your computer and use it in GitHub Desktop.

Download ZIP

Sourcegraph vs Sourcebot — Query Syntax Comparison

Raw

sourcegraph-vs-sourcebot.md

Sourcegraph vs Sourcebot — Query Syntax Comparison

Predicates (with parentheses)

Predicate	Sourcegraph	Sourcebot	Notes
`repo:has.path(...)`	✅	❌	Repos containing matching file paths
`repo:has.meta(...)`	✅	❌	Filter by repo metadata (key-value)
`repo:has.topic(...)`	✅	❌	Filter by GitHub/GitLab topic
`repo:has.commit.after(...)`	✅	❌	Exclude stale repos
`rev:at.time(...)`	✅	❌	Search repo at a point in time
`file:has.content(...)`	✅	❌	Files matching a regex pattern
`file:has.owners(...)`	✅	❌	Files owned by specific owners
`file:has.contributor(...)`	✅	❌	Files by contributor name/email

Sourcebot has no predicates. Zero. The entire predicate system is Sourcegraph-only.

Field Filters

Filter	Sourcegraph	Sourcebot	Notes
`repo:` / `-repo:`	✅	✅	Both support negation
`file:` / `-file:`	✅	✅	Both support negation
`lang:` / `-lang:`	✅	✅	Aliases differ (Sourcegraph: `language:`, `l:`; Sourcebot: `lang:`)
`rev:`	✅	✅	Both support branch/tag targeting
`content:` / `-content:`	✅	❌	Explicit content filter
`type:`	✅	❌	symbol, path, file, diff, commit
`select:`	✅	❌	Result type projection
`case:`	✅	❌	Case-sensitive matching
`fork:`	✅	❌	Include/filter forked repos
`archived:`	✅	❌	Include/filter archived repos
`visibility:`	✅	❌	Public/private filtering
`count:`	✅	❌	Result limit
`timeout:`	✅	❌	Search timeout
`patterntype:`	✅	❌	keyword vs regexp mode
`author:` / `-author:`	✅	❌	Commit author (diff/commit search)
`committer:`	✅	❌	Committer filter
`before:` / `after:`	✅	❌	Commit date range
`message:` / `-message:`	✅	❌	Commit message filter
`sym:` / `-sym:`	❌	✅	Symbol search via universal ctags
`context:` / `-context:`	❌	✅	Predefined search context

Operators

Operator	Sourcegraph	Sourcebot
`AND` / space	✅	✅
`OR` / `or`	✅	✅
`NOT` / `-prefix`	✅	✅
`()` grouping	✅	✅
`""` phrase	✅	✅
Regex (`.*`)	✅	✅

Summary

Category	Sourcegraph	Sourcebot
Predicates	8	0
Field filters	~20	6
Operators	6	6
Unique to Sourcebot	—	`sym:`, `context:`
Unique to Sourcegraph	All predicates + `type:`, `select:`, `case:`, commit filters, `visibility:`, `fork:`, `archived:`, `patterntype:`	—

Sourcebot covers the basic search surface (repo/file/lang/rev) but has no predicate system and no commit/diff search. Sourcegraph is dramatically richer — predicates alone are a major feature gap.

Sources: Sourcegraph docs · Sourcebot docs

thomasleveil commented Mar 15, 2026

Author

@grok are these claims exact ?

The claims in the Gist (created March 15, 2026) regarding query syntax comparison between Sourcegraph and Sourcebot remain accurate as of the current date, based on official documentation. Sourcegraph's documentation (last updated implicitly around early 2026, with features up to version 5.4) confirms support for approximately 10 predicates (e.g., repo:has.path(...), file:has.content(...)), over 20 field filters (e.g., repo:, content:, type:, select:, commit-related like author:), and 6 operators (AND, OR, NOT, -, (), for regex). Sourcebot's documentation (published March 2025, with content up to March 2026) verifies no predicates, 6 field filters (repo:, file:, lang:, rev:, sym:, context:), and the same 6 operators, with regex and keyword modes but lacking advanced commit or metadata filtering. No substantive updates alter the Gist's assessment within this syntax-focused scope.

thomasleveil commented Mar 15, 2026

Author

take a look at sourcebot source code, how difficult would it be to implement the select:repo filter ?

  Difficulty: Medium — ~4-6 hours of work.

  Here's the breakdown:

  ---
  Architecture

  Sourcebot is full TypeScript (Next.js frontend + Node.js backend), using Zoekt (Google's code search engine) over gRPC as the actual search backend. The query pipeline has 3 clear layers:

  1. packages/queryLanguage/ — Lezer grammar + parser (separate npm package)
  2. packages/web/src/features/search/ — IR (intermediate representation) + Zoekt client
  3. API routes — /api/search and /api/stream_search

  ---
  Where select:repo would need to go

  ┌─────────────────────────┬──────────────────────────────────────────────────────────────────┬─────────────────────────────────────────┬──────────────┐
  │          Step           │                               File                               │                 Change                  │     Size     │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 1. Add token            │ packages/queryLanguage/src/tokens.ts                             │ Append "select:" to PREFIXES array      │ 1 line       │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 2. Update grammar       │ packages/queryLanguage/src/query.grammar                         │ Add SelectExpr production rule          │ ~5-10 lines  │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 3. IR type              │ packages/web/src/features/search/ir.ts                           │ Add { select: string } to QueryIR union │ ~3 lines     │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 4. Parser transform     │ packages/web/src/features/search/parser.ts → transformTreeToIR() │ Add case for SelectExpr                 │ ~10-15 lines │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 5. Result deduplication │ packages/web/src/features/search/zoektSearcher.ts                │ Post-process: keep first file per repo  │ ~15-20 lines │
  ├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
  │ 6. Syntax docs          │ packages/queryLanguage/src/syntaxDescription.ts                  │ Add entry                               │ ~3 lines     │
  └─────────────────────────┴──────────────────────────────────────────────────────────────────┴─────────────────────────────────────────┴──────────────┘

  The deduplication itself is trivial:
  const seen = new Set<string>()
  response.files = response.files.filter(f => {
    if (seen.has(f.repository.name)) return false
    seen.add(f.repository.name)
    return true
  })

  ---
  Why not "Easy"

  1. Lezer parser regeneration — parser.ts is auto-generated from query.grammar. You can't edit it directly; you need to run lezer-generator (build step not documented). If the grammar change breaks the generator, debugging the binary state machine is painful.
  2. Streaming complicates deduplication — /api/stream_search sends SSE chunks progressively. Deduplication needs per-chunk state tracking or a final-pass buffer.
  3. Stats become inaccurate — Zoekt returns fileCount based on pre-filter results; the count displayed to the user will be wrong without recalculating post-dedup.
  4. No test infrastructure visible — No test files found for the search pipeline; harder to validate safely.

  ---
  Bottom line

  The filtering logic itself is ~50 lines of TypeScript. The risk is in the Lezer grammar build pipeline and the streaming edge case. If you can confirm the lezer-generator build step works cleanly, this is a solid afternoon of work. If the grammar toolchain is opaque, add a buffer day.

thomasleveil/sourcegraph-vs-sourcebot.md

Select an option

No results found

Select an option

No results found

Sourcegraph vs Sourcebot — Query Syntax Comparison

Predicates (with parentheses)

Field Filters

Operators

Summary

thomasleveil commented Mar 15, 2026

Uh oh!

thomasleveil commented Mar 15, 2026

Uh oh!