You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sourcebot covers the basic search surface (repo/file/lang/rev) but has no predicate system and no commit/diff search. Sourcegraph is dramatically richer — predicates alone are a major feature gap.
The claims in the Gist (created March 15, 2026) regarding query syntax comparison between Sourcegraph and Sourcebot remain accurate as of the current date, based on official documentation. Sourcegraph's documentation (last updated implicitly around early 2026, with features up to version 5.4) confirms support for approximately 10 predicates (e.g., repo:has.path(...), file:has.content(...)), over 20 field filters (e.g., repo:, content:, type:, select:, commit-related like author:), and 6 operators (AND, OR, NOT, -, (), for regex). Sourcebot's documentation (published March 2025, with content up to March 2026) verifies no predicates, 6 field filters (repo:, file:, lang:, rev:, sym:, context:), and the same 6 operators, with regex and keyword modes but lacking advanced commit or metadata filtering. No substantive updates alter the Gist's assessment within this syntax-focused scope.
take a look at sourcebot source code, how difficult would it be to implement the select:repo filter ?
Difficulty: Medium — ~4-6 hours of work.
Here's the breakdown:
---
Architecture
Sourcebot is full TypeScript (Next.js frontend + Node.js backend), using Zoekt (Google's code search engine) over gRPC as the actual search backend. The query pipeline has 3 clear layers:
1. packages/queryLanguage/ — Lezer grammar + parser (separate npm package)
2. packages/web/src/features/search/ — IR (intermediate representation) + Zoekt client
3. API routes — /api/search and /api/stream_search
---
Where select:repo would need to go
┌─────────────────────────┬──────────────────────────────────────────────────────────────────┬─────────────────────────────────────────┬──────────────┐
│ Step │ File │ Change │ Size │
├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
│ 1. Add token │ packages/queryLanguage/src/tokens.ts │ Append "select:" to PREFIXES array │ 1 line │
├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
│ 2. Update grammar │ packages/queryLanguage/src/query.grammar │ Add SelectExpr production rule │ ~5-10 lines │
├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
│ 3. IR type │ packages/web/src/features/search/ir.ts │ Add { select: string } to QueryIR union │ ~3 lines │
├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
│ 4. Parser transform │ packages/web/src/features/search/parser.ts → transformTreeToIR() │ Add case for SelectExpr │ ~10-15 lines │
├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
│ 5. Result deduplication │ packages/web/src/features/search/zoektSearcher.ts │ Post-process: keep first file per repo │ ~15-20 lines │
├─────────────────────────┼──────────────────────────────────────────────────────────────────┼─────────────────────────────────────────┼──────────────┤
│ 6. Syntax docs │ packages/queryLanguage/src/syntaxDescription.ts │ Add entry │ ~3 lines │
└─────────────────────────┴──────────────────────────────────────────────────────────────────┴─────────────────────────────────────────┴──────────────┘
The deduplication itself is trivial:
const seen = new Set<string>()
response.files = response.files.filter(f => {
if (seen.has(f.repository.name)) return false
seen.add(f.repository.name)
return true
})
---
Why not "Easy"
1. Lezer parser regeneration — parser.ts is auto-generated from query.grammar. You can't edit it directly; you need to run lezer-generator (build step not documented). If the grammar change breaks the generator, debugging the binary state machine is painful.
2. Streaming complicates deduplication — /api/stream_search sends SSE chunks progressively. Deduplication needs per-chunk state tracking or a final-pass buffer.
3. Stats become inaccurate — Zoekt returns fileCount based on pre-filter results; the count displayed to the user will be wrong without recalculating post-dedup.
4. No test infrastructure visible — No test files found for the search pipeline; harder to validate safely.
---
Bottom line
The filtering logic itself is ~50 lines of TypeScript. The risk is in the Lezer grammar build pipeline and the streaming edge case. If you can confirm the lezer-generator build step works cleanly, this is a solid afternoon of work. If the grammar toolchain is opaque, add a buffer day.
@grok are these claims exact ?
The claims in the Gist (created March 15, 2026) regarding query syntax comparison between Sourcegraph and Sourcebot remain accurate as of the current date, based on official documentation. Sourcegraph's documentation (last updated implicitly around early 2026, with features up to version 5.4) confirms support for approximately 10 predicates (e.g., repo:has.path(...), file:has.content(...)), over 20 field filters (e.g., repo:, content:, type:, select:, commit-related like author:), and 6 operators (AND, OR, NOT, -, (), for regex). Sourcebot's documentation (published March 2025, with content up to March 2026) verifies no predicates, 6 field filters (repo:, file:, lang:, rev:, sym:, context:), and the same 6 operators, with regex and keyword modes but lacking advanced commit or metadata filtering. No substantive updates alter the Gist's assessment within this syntax-focused scope.