Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save possibilities/5561a225b4dd66d16b2198125dca62a7 to your computer and use it in GitHub Desktop.

Select an option

Save possibilities/5561a225b4dd66d16b2198125dca62a7 to your computer and use it in GitHub Desktop.
QMD Embedding Model Upgrade Report (2026-03-20)
label: after
timestamp: 20260320T065406Z
host: artbird.taile9945f.ts.net
query_count: 13
error_count: 3
queries:
- category: keyword
label: click group commands
endpoint: search
latency_ms: 383
result_count: 5
results:
- docid: '#9e9280'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/commands-and-groups.md
score: 0.88
title: Basic Commands, Groups, Context
- docid: '#4dc584'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/extending-click.md
score: 0.88
title: Extending Click
- docid: '#3aba7c'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/exceptions.md
score: 0.88
title: Exception Handling and Exit Codes
- docid: '#3c8836'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/quickstart.md
score: 0.88
title: Quickstart
- docid: '#377941'
path: qmd://content-sessions:-Users-mike-code-arthack/e6bd479f-14ed-4f0f-aa9d-92bee99deaf0/agents/a3286b0.md
score: 0.88
title: User
- category: keyword
label: systemd unit file
endpoint: search
latency_ms: 374
result_count: 5
results:
- docid: '#8ee7d4'
path: qmd://content-sessions:-Users-mike-code-arthack/8b328745-0002-4b53-a8d7-f9c05d4972ed/agents/a296336d03badd7f2.md
score: 0.93
title: User
- docid: '#db6bd1'
path: qmd://content-sessions:-Users-mike-code-arthack/785634ee-6622-4566-9fb5-781465b8b2d4/agents/a71ddb717e255ff16.md
score: 0.93
title: User
- docid: '#7cea35'
path: qmd://content-sessions:-Users-mike-code-arthack/b56ff826-a0d6-4f39-a752-31f34b862102/agents/a245e0530f5ba5718.md
score: 0.93
title: User
- docid: '#e9c091'
path: qmd://content-sessions:-Users-mike-code-arthack/cd2c72e5-945b-4d21-975d-ae64593c99ed/agents/a16d4b28b98981dc5.md
score: 0.93
title: User
- docid: '#d2f772'
path: qmd://content-sessions:-Users-mike-code-arthack/a11ede63-4b0e-4ebe-9270-9d617d83d763/agents/abb25daffefbdccf7.md
score: 0.93
title: User
- category: keyword
label: pydantic model validation
endpoint: search
latency_ms: 409
result_count: 5
results:
- docid: '#af6d62'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/alias.md
score: 0.93
title: '`AliasPath` and `AliasChoices`'
- docid: '#afd025'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/errors/usage-errors.md
score: 0.93
title: Class not fully defined {#class-not-fully-defined}
- docid: '#03e970'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/validators.md
score: 0.93
title: Field validators
- docid: '#211166'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/dataclasses.md
score: 0.93
title: Dataclass config
- docid: '#67e386'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/models.md
score: 0.93
title: Basic model usage
- category: semantic
label: auth token handling
endpoint: search
latency_ms: 433
result_count: 5
results:
- docid: '#2e0675'
path: qmd://content-topics:geminicli/content/documents/github-com-google-gemini-gemini-cli/docs/core/remote-agents.md
score: 0.94
title: Remote Subagents (experimental)
- docid: '#5d884a'
path: qmd://content-sessions:-Users-mike-code-arthack/b72a4aac-9d49-4c68-8ddd-c7a7f4acd945/agents/a3e2afee013e96225.md
score: 0.93
title: User
- docid: '#4833e2'
path: qmd://content-topics:geminicli/content/documents/github-com-google-gemini-gemini-cli/docs/tools/mcp-server.md
score: 0.93
title: MCP servers with the Gemini CLI
- docid: '#809ae4'
path: qmd://content-topics:perplexity/content/documents/docs-perplexity-ai-docs-admin-api-key-management.md
score: 0.93
title: API Key Management
- docid: '#7eeda2'
path: qmd://content-topics:swiftpm/content/documents/github-com-swiftlang-swift-package-manager/documentation/packageregistry/packageregistryusage.md
score: 0.93
title: Package Registry Usage
- category: semantic
label: async error handling
endpoint: search
latency_ms: 549
result_count: 5
results:
- docid: '#671bb5'
path: qmd://content-sessions:-Users-mike-code-arthack/23688efc-22b7-4f12-adf6-813cae70f90b/agents/a93d4f4fd5c817698.md
score: 0.89
title: User
- docid: '#fdc30f'
path: qmd://content-sessions:-Users-mike-code-arthack/4bba43c4-7436-4f38-ab54-a983409f185d/agents/a43c0ba.md
score: 0.88
title: User
- docid: '#86579b'
path: qmd://content-sessions:-Users-mike-code-arthack/51dffc22-004f-4989-b958-c3b2da4d3890/agents/ac4714a74a5c7bc8c.md
score: 0.88
title: User
- docid: '#5b700f'
path: qmd://content-sessions:-Users-mike-experiment-linux-browser-farm/58840549-2606-4b0a-9415-d4159c63de43/agents/af174e0.md
score: 0.88
title: User
- docid: '#fccfa0'
path: qmd://content-sessions:-Users-mike-code-arthack/c6bed1e2-1edc-472c-bfef-432f861bda30/agents/a14e447.md
score: 0.88
title: User
- category: semantic
label: systemd deploy workflows
endpoint: search
latency_ms: 387
result_count: 5
results:
- docid: '#e9c091'
path: qmd://content-sessions:-Users-mike-code-arthack/cd2c72e5-945b-4d21-975d-ae64593c99ed/agents/a16d4b28b98981dc5.md
score: 0.97
title: User
- docid: '#8ee7d4'
path: qmd://content-sessions:-Users-mike-code-arthack/8b328745-0002-4b53-a8d7-f9c05d4972ed/agents/a296336d03badd7f2.md
score: 0.97
title: User
- docid: '#dd420f'
path: qmd://content-sessions:-Users-mike-code-arthack/c07d644e-d5eb-4515-8d6f-57e4a2d9901c/agents/a4edb9495187eb6bf.md
score: 0.97
title: User
- docid: '#5ee9ce'
path: qmd://content-sessions:-Users-mike-code-arthack/399e942b-76cb-458f-97fd-4f21e600d4d6/agents/a365dbbde4527858c.md
score: 0.97
title: User
- docid: '#8095a0'
path: qmd://content-sessions:-Users-mike-code-arthack/09c271cc-416e-49ef-92a2-7cabf8a15dec/agents/a0beb7a08e10afd0f.md
score: 0.97
title: User
- category: cross-collection
label: YAML config loading
endpoint: search
latency_ms: 392
result_count: 5
results:
- docid: '#3ee637'
path: qmd://content-sessions:-Users-mike-code-arthack/0227bb8d-63c3-423b-b76a-27dd0aa6b5ce/agents/a6290bf.md
score: 0.9
title: User
- docid: '#371a38'
path: qmd://content-sessions:-Users-mike-code/f25d21f2-53fd-4a80-9f5a-7a26b8376166/agents/a0edc63a1f3beb9eb.md
score: 0.9
title: User
- docid: '#dc3032'
path: qmd://content-sessions:-Users-mike-code-arthack/29e1d66f-2312-4ce6-bff9-f5a94e0f69f3/agents/a99d01e.md
score: 0.9
title: User
- docid: '#c62bd7'
path: qmd://content-sessions:-Users-mike-code/f25d21f2-53fd-4a80-9f5a-7a26b8376166/agents/adc817874c3d37a3d.md
score: 0.9
title: User
- docid: '#468d3c'
path: qmd://content-sessions:-Users-mike-code-arthack-marketplace/3ef9b94b-628d-4279-b9e6-7e738d5421ca/agents/a5438b3.md
score: 0.9
title: User
- category: cross-collection
label: notifications
endpoint: search
latency_ms: 409
result_count: 5
results:
- docid: '#102b5b'
path: qmd://content-sessions:-Users-mike-code-arthack/3194d75e-108e-4116-b374-eb1c0be98cd2/agents/ada9e6a8f9a1220da.md
score: 0.94
title: User
- docid: '#083a0f'
path: qmd://content-sessions:-Users-mike-code-arthack/f413d12f-1c13-476e-ae0c-905fe3f493c0.md
score: 0.94
title: User
- docid: '#d5e6bd'
path: qmd://content-sessions:-Users-mike-code-arthack/9bb479ad-f5f9-4eed-b436-6da4a88edde1/agents/a4387b7813f07f6b3.md
score: 0.94
title: User
- docid: '#7b49f9'
path: qmd://content-sessions:-Users-mike-code-arthack/d1125869-b842-473b-9d5f-d82c014f8fce/agents/ac2865394df533977.md
score: 0.94
title: User
- docid: '#999515'
path: qmd://content-sessions:-Users-mike-code-arthack/eb4cc13e-5825-47fb-98a7-5f71b94e1767/agents/aa715ca981729ab6b.md
score: 0.94
title: User
- category: long-context
label: CLI architecture
endpoint: search
latency_ms: 714
result_count: 1
results:
- docid: '#aaabb9'
path: qmd://content-sessions:-Users-mike-code-voice-of-arthacker/0c85ed6d-6b28-4418-916c-9a7dda33ce68/agents/a5440a6.md
score: 0.97
title: User
- category: long-context
label: daemon architecture
endpoint: search
latency_ms: 492
result_count: 0
results: []
- category: structured
label: 'hyde: Telegram bot security'
endpoint: query
error: 'HTTP Error 502: Bad Gateway'
latency_ms: 1376
- category: structured
label: 'expand: vector embeddings'
endpoint: query
error: 'HTTP Error 502: Bad Gateway'
latency_ms: 1497
- category: structured
label: 'expand: file watching'
endpoint: query
error: 'HTTP Error 502: Bad Gateway'
latency_ms: 1673
label: before
timestamp: 20260320T030240Z
host: artbird.taile9945f.ts.net
query_count: 13
error_count: 0
queries:
- category: keyword
label: click group commands
endpoint: search
latency_ms: 396
result_count: 5
results:
- docid: '#9e9280'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/commands-and-groups.md
score: 0.88
title: Basic Commands, Groups, Context
- docid: '#4dc584'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/extending-click.md
score: 0.88
title: Extending Click
- docid: '#3aba7c'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/exceptions.md
score: 0.88
title: Exception Handling and Exit Codes
- docid: '#3c8836'
path: qmd://content-topics:click/content/documents/github-com-pallets-click/docs/quickstart.md
score: 0.88
title: Quickstart
- docid: '#377941'
path: qmd://content-sessions:-Users-mike-code-arthack/e6bd479f-14ed-4f0f-aa9d-92bee99deaf0/agents/a3286b0.md
score: 0.88
title: User
- category: keyword
label: systemd unit file
endpoint: search
latency_ms: 413
result_count: 5
results:
- docid: '#8ee7d4'
path: qmd://content-sessions:-Users-mike-code-arthack/8b328745-0002-4b53-a8d7-f9c05d4972ed/agents/a296336d03badd7f2.md
score: 0.93
title: User
- docid: '#db6bd1'
path: qmd://content-sessions:-Users-mike-code-arthack/785634ee-6622-4566-9fb5-781465b8b2d4/agents/a71ddb717e255ff16.md
score: 0.93
title: User
- docid: '#7cea35'
path: qmd://content-sessions:-Users-mike-code-arthack/b56ff826-a0d6-4f39-a752-31f34b862102/agents/a245e0530f5ba5718.md
score: 0.93
title: User
- docid: '#e9c091'
path: qmd://content-sessions:-Users-mike-code-arthack/cd2c72e5-945b-4d21-975d-ae64593c99ed/agents/a16d4b28b98981dc5.md
score: 0.93
title: User
- docid: '#d2f772'
path: qmd://content-sessions:-Users-mike-code-arthack/a11ede63-4b0e-4ebe-9270-9d617d83d763/agents/abb25daffefbdccf7.md
score: 0.93
title: User
- category: keyword
label: pydantic model validation
endpoint: search
latency_ms: 357
result_count: 5
results:
- docid: '#af6d62'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/alias.md
score: 0.93
title: '`AliasPath` and `AliasChoices`'
- docid: '#afd025'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/errors/usage-errors.md
score: 0.93
title: Class not fully defined {#class-not-fully-defined}
- docid: '#03e970'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/validators.md
score: 0.93
title: Field validators
- docid: '#211166'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/dataclasses.md
score: 0.93
title: Dataclass config
- docid: '#67e386'
path: qmd://content-topics:pydantic/content/documents/github-com-pydantic-pydantic/docs/concepts/models.md
score: 0.93
title: Basic model usage
- category: semantic
label: auth token handling
endpoint: search
latency_ms: 392
result_count: 5
results:
- docid: '#2e0675'
path: qmd://content-topics:geminicli/content/documents/github-com-google-gemini-gemini-cli/docs/core/remote-agents.md
score: 0.94
title: Remote Subagents (experimental)
- docid: '#5d884a'
path: qmd://content-sessions:-Users-mike-code-arthack/b72a4aac-9d49-4c68-8ddd-c7a7f4acd945/agents/a3e2afee013e96225.md
score: 0.93
title: User
- docid: '#4833e2'
path: qmd://content-topics:geminicli/content/documents/github-com-google-gemini-gemini-cli/docs/tools/mcp-server.md
score: 0.93
title: MCP servers with the Gemini CLI
- docid: '#809ae4'
path: qmd://content-topics:perplexity/content/documents/docs-perplexity-ai-docs-admin-api-key-management.md
score: 0.93
title: API Key Management
- docid: '#7eeda2'
path: qmd://content-topics:swiftpm/content/documents/github-com-swiftlang-swift-package-manager/documentation/packageregistry/packageregistryusage.md
score: 0.93
title: Package Registry Usage
- category: semantic
label: async error handling
endpoint: search
latency_ms: 576
result_count: 5
results:
- docid: '#671bb5'
path: qmd://content-sessions:-Users-mike-code-arthack/23688efc-22b7-4f12-adf6-813cae70f90b/agents/a93d4f4fd5c817698.md
score: 0.89
title: User
- docid: '#fdc30f'
path: qmd://content-sessions:-Users-mike-code-arthack/4bba43c4-7436-4f38-ab54-a983409f185d/agents/a43c0ba.md
score: 0.88
title: User
- docid: '#86579b'
path: qmd://content-sessions:-Users-mike-code-arthack/51dffc22-004f-4989-b958-c3b2da4d3890/agents/ac4714a74a5c7bc8c.md
score: 0.88
title: User
- docid: '#5b700f'
path: qmd://content-sessions:-Users-mike-experiment-linux-browser-farm/58840549-2606-4b0a-9415-d4159c63de43/agents/af174e0.md
score: 0.88
title: User
- docid: '#fccfa0'
path: qmd://content-sessions:-Users-mike-code-arthack/c6bed1e2-1edc-472c-bfef-432f861bda30/agents/a14e447.md
score: 0.88
title: User
- category: semantic
label: systemd deploy workflows
endpoint: search
latency_ms: 375
result_count: 5
results:
- docid: '#e9c091'
path: qmd://content-sessions:-Users-mike-code-arthack/cd2c72e5-945b-4d21-975d-ae64593c99ed/agents/a16d4b28b98981dc5.md
score: 0.97
title: User
- docid: '#8ee7d4'
path: qmd://content-sessions:-Users-mike-code-arthack/8b328745-0002-4b53-a8d7-f9c05d4972ed/agents/a296336d03badd7f2.md
score: 0.97
title: User
- docid: '#dd420f'
path: qmd://content-sessions:-Users-mike-code-arthack/c07d644e-d5eb-4515-8d6f-57e4a2d9901c/agents/a4edb9495187eb6bf.md
score: 0.97
title: User
- docid: '#5ee9ce'
path: qmd://content-sessions:-Users-mike-code-arthack/399e942b-76cb-458f-97fd-4f21e600d4d6/agents/a365dbbde4527858c.md
score: 0.97
title: User
- docid: '#8095a0'
path: qmd://content-sessions:-Users-mike-code-arthack/09c271cc-416e-49ef-92a2-7cabf8a15dec/agents/a0beb7a08e10afd0f.md
score: 0.97
title: User
- category: cross-collection
label: YAML config loading
endpoint: search
latency_ms: 374
result_count: 5
results:
- docid: '#3ee637'
path: qmd://content-sessions:-Users-mike-code-arthack/0227bb8d-63c3-423b-b76a-27dd0aa6b5ce/agents/a6290bf.md
score: 0.9
title: User
- docid: '#371a38'
path: qmd://content-sessions:-Users-mike-code/f25d21f2-53fd-4a80-9f5a-7a26b8376166/agents/a0edc63a1f3beb9eb.md
score: 0.9
title: User
- docid: '#dc3032'
path: qmd://content-sessions:-Users-mike-code-arthack/29e1d66f-2312-4ce6-bff9-f5a94e0f69f3/agents/a99d01e.md
score: 0.9
title: User
- docid: '#c62bd7'
path: qmd://content-sessions:-Users-mike-code/f25d21f2-53fd-4a80-9f5a-7a26b8376166/agents/adc817874c3d37a3d.md
score: 0.9
title: User
- docid: '#468d3c'
path: qmd://content-sessions:-Users-mike-code-arthack-marketplace/3ef9b94b-628d-4279-b9e6-7e738d5421ca/agents/a5438b3.md
score: 0.9
title: User
- category: cross-collection
label: notifications
endpoint: search
latency_ms: 361
result_count: 5
results:
- docid: '#102b5b'
path: qmd://content-sessions:-Users-mike-code-arthack/3194d75e-108e-4116-b374-eb1c0be98cd2/agents/ada9e6a8f9a1220da.md
score: 0.94
title: User
- docid: '#083a0f'
path: qmd://content-sessions:-Users-mike-code-arthack/f413d12f-1c13-476e-ae0c-905fe3f493c0.md
score: 0.94
title: User
- docid: '#d5e6bd'
path: qmd://content-sessions:-Users-mike-code-arthack/9bb479ad-f5f9-4eed-b436-6da4a88edde1/agents/a4387b7813f07f6b3.md
score: 0.94
title: User
- docid: '#7b49f9'
path: qmd://content-sessions:-Users-mike-code-arthack/d1125869-b842-473b-9d5f-d82c014f8fce/agents/ac2865394df533977.md
score: 0.94
title: User
- docid: '#999515'
path: qmd://content-sessions:-Users-mike-code-arthack/eb4cc13e-5825-47fb-98a7-5f71b94e1767/agents/aa715ca981729ab6b.md
score: 0.94
title: User
- category: long-context
label: CLI architecture
endpoint: search
latency_ms: 703
result_count: 1
results:
- docid: '#aaabb9'
path: qmd://content-sessions:-Users-mike-code-voice-of-arthacker/0c85ed6d-6b28-4418-916c-9a7dda33ce68/agents/a5440a6.md
score: 0.97
title: User
- category: long-context
label: daemon architecture
endpoint: search
latency_ms: 484
result_count: 0
results: []
- category: structured
label: 'hyde: Telegram bot security'
endpoint: query
latency_ms: 5632
result_count: 5
results:
- docid: '#c0f33e'
path: qmd://content-sessions:-Users-mike-code-arthack/179d2f95-9131-44e2-837d-e3a0996331ab/agents/a2e9d62f0ed14278e.md
score: 0.93
title: User
- docid: '#c7fa4a'
path: qmd://content-sessions:-Users-mike-code-arthack/2a4814f7-8ada-4542-8f24-c84ab0b27180/agents/a96d693889707c85d.md
score: 0.56
title: User
- docid: '#5e20e7'
path: qmd://content-sessions:-Users-mike-code-arthack/39db4ded-59aa-4815-83f1-a41f4487b213/agents/a8578482c586ef923.md
score: 0.47
title: User
- docid: '#2c87e1'
path: qmd://content-sessions:-Users-mike-code-arthack/2a4814f7-8ada-4542-8f24-c84ab0b27180/agents/a2ebcdcd9353f3885.md
score: 0.47
title: User
- docid: '#6f460e'
path: qmd://content-sessions:-Users-mike-code-arthack/1719c35c-cc98-421a-b505-672c970277d1/agents/ac73f90293f09e2e5.md
score: 0.46
title: User
- category: structured
label: 'expand: vector embeddings'
endpoint: query
latency_ms: 12830
result_count: 5
results:
- docid: '#44f26a'
path: qmd://content-topics:haystack/content/documents/github-com-deepset-ai-haystack/docs-website/docs/pipeline-components/retrievers/weaviateembeddingretriever.md
score: 0.9
title: WeaviateEmbeddingRetriever
- docid: '#6ddcb7'
path: qmd://content-topics:perplexity/content/documents/docs-perplexity-ai-docs-embeddings-best-practices.md
score: 0.54
title: Best Practices
- docid: '#2f8b36'
path: qmd://content-topics:kit/content/documents/github-com-cased-kit/docs/src/content/docs/core-concepts/semantic-search.md
score: 0.47
title: How it works
- docid: '#996861'
path: qmd://content-topics:haystack/content/documents/github-com-deepset-ai-haystack/docs-website/docs/pipeline-components/retrievers/pgvectorembeddingretriever.md
score: 0.47
title: PgvectorEmbeddingRetriever
- docid: '#51df84'
path: qmd://content-sessions:-Users-mike-code-arthack-marketplace/ca94aa72-91c9-4f0d-91a3-538215572419/agents/ae5e162.md
score: 0.45
title: User
- category: structured
label: 'expand: file watching'
endpoint: query
latency_ms: 11404
result_count: 5
results:
- docid: '#9b5302'
path: qmd://content-sessions:-Users-mike-code-arthack/0f2e9cf3-8d80-4456-b7d6-d4fa12824a46/agents/a9d62c3e11196239c.md
score: 0.9
title: User
- docid: '#52eaed'
path: qmd://content-sessions:-Users-mike-code-docs/2b229db5-49ae-440c-9f56-a316c6d57d73.md
score: 0.53
title: User
- docid: '#9cf879'
path: qmd://content-sessions:-Users-mike-code-arthack/0685d4ff-965e-4a27-836e-4695c39ea59d.md
score: 0.39
title: User
- docid: '#9addae'
path: qmd://content-sessions:-Users-mike-code-arthack/5e81024b-d851-41fe-a022-dca7994b0f7b/agents/ae460c0.md
score: 0.38
title: User
- docid: '#8ed70c'
path: qmd://content-sessions:-Users-mike-code-docs/1e0256c4-12d5-44d5-a283-6e5f8aa24dc3.md
score: 0.35
title: User

QMD Embedding Model Upgrade Report

Date: 2026-03-20 Duration: ~4 hours (03:00 - 06:55 UTC) Host: artbird (RTX 3070, 8GB VRAM, i7-9700)

Goal

Upgrade QMD's embedding model from embeddinggemma-300M (2K context, 768-dim) to Qwen3-Embedding-0.6B (32K context, 1024-dim) to improve search quality, and measure impact with a before/after benchmark.

What Was Done

Code Changes (2 commits)

291cf0c8 feat: add benchmark-search subcommand and upgrade QMD embedding model

  • Created apps/qmdctl/qmdctl/run_benchmark_search.py — 13 test queries across 4 categories (keyword, semantic, cross-collection, long-context, structured), with YAML output and comparison mode
  • Wired into cli.py and pyproject.toml as qmdctl-benchmark-search
  • Updated UNIT_QMDCTL systemd unit with QMD_EMBED_MODEL and QMD_EXPAND_CONTEXT_SIZE=4096 env vars
  • Added QMD embedding model to sandcastles.md
  • Updated CLAUDE.md with benchmark-search subcommand

4423e081 fix: restart services when systemd unit files change

  • Provisioner now restarts running services when their unit file was rewritten (previously only started stopped services)

Deployment Timeline

Time (UTC) Event
03:02 Verified GPU health, all 6 services running, embeddinggemma-300M active
03:02 Ran "before" benchmark — all 13 queries successful
03:06 Committed and pushed. Syncthing synced to artbird ~45s later
03:06 Reinstalled qmdctl package, ran provision-service, restarted qmdctl
03:08 Started qmd embed -f — Qwen3 model downloading (639MB, 7s)
03:09 CUDA binary rebuild triggered (node-llama-cpp, ~10 min)
03:20 Embed started — GPU at 94%, 148W, ~1000 vectors/min
03:50 First OOM crash at 21,216 vectors — CUDA OOM on large chunks
03:55 Discovered orphan embed processes holding VRAM
04:07 MCP server holding 7GB VRAM, blocking embed restarts
04:08 Killed MCP server, freed VRAM, restarted embed
04:10 Embed running again, steady progress
05:00 Another OOM crash at 51,200 vectors
05:00 Restarted embed — continued from where it left off
05:20 OOM crash at 81,504 vectors, restarted again
05:30 Set up auto-restart monitoring loop
06:15 OOM crash at 142,656 vectors, restarted for final batch
06:48 Embedding complete: 144,781 vectors from 25,260 documents
06:50 Restarted MCP server
06:54 Ran "after" benchmark

Embedding Stats

Metric Value
Documents indexed 25,260
Vectors embedded 144,781 (~5.7 chunks/doc avg)
Total embed time ~2.5 hours (across multiple restarts)
Throughput ~1,000 vectors/min at GPU 100%
VRAM usage ~7,100 MiB (model + compute buffers)
GPU utilization 94-100% sustained
OOM crashes 4 (each recovered by restart)
Model file size 639 MB (Q8_0 quantization)

Benchmark Comparison

Search queries (BM25, /api/search): Identical results before and after. Expected — the /api/search endpoint uses BM25 text search, not vector embeddings. The embedding model change does not affect these queries.

Query Before Score After Score Top-5 Overlap
click group commands 0.88 0.88 5/5
systemd unit file 0.93 0.93 5/5
pydantic model validation 0.93 0.93 5/5
auth token handling 0.94 0.94 5/5
async error handling 0.89 0.89 5/5
systemd deploy workflows 0.97 0.97 5/5
YAML config loading 0.90 0.90 5/5
notifications 0.94 0.94 5/5
CLI architecture 0.97 0.97 1/1
daemon architecture 0.00 0.00 0/0

Structured queries (hyde/expand, /api/query): All failed with 502 after the upgrade. Root cause: dimension mismatch (see below).

Blocking Issue: Dimension Mismatch in QMD

The Qwen3-Embedding model upgrade is only partially working. Embeddings were stored successfully (1024-dim), but vector search at query time fails with:

Dimension mismatch for query vector for the "embedding" column.
Expected 1024 dimensions but received 768.

Root cause: QMD's store.js hardcodes DEFAULT_EMBED_MODEL = "embeddinggemma" for vector lookups. When the MCP server generates query embeddings, it uses this default model name, which resolves to the old 768-dim embeddinggemma model, not the 1024-dim Qwen3 model that was used for indexing.

The QMD_EMBED_MODEL env var is correctly read by:

  • llm.js — for downloading and loading the model file
  • embed CLI command — for indexing
  • formatQueryForEmbedding() / formatDocForEmbedding() — for prompt formatting

But it is NOT read by:

  • store.js line 121: searchVector → always passes DEFAULT_EMBED_MODEL = "embeddinggemma"
  • store.js structured search code → same hardcoded model name
  • index.js line 121: createStore() → passes DEFAULT_EMBED_MODEL to searchVec()

Impact: /api/search (BM25 text) works fine. /api/query (hybrid with vectors) crashes on every request.

This is a QMD upstream bug. The fix needs store.js and index.js to read process.env.QMD_EMBED_MODEL (or derive the model name from the env var) when performing vector searches, not just at embed time.

Open Items

Must fix (QMD upstream)

  • Dimension mismatch bugstore.js must respect QMD_EMBED_MODEL for query-time vector lookups, not just embed-time indexing

Nice to have

  • Model keep-alive — QMD's MCP server sets disposeModelsOnInactivity: true with a 5-min timeout. Models unload from VRAM after idle. No env var to override.
  • Chrome GPU usage — 5-7 browserctl Chrome instances use ~140-200MiB of RTX 3070 VRAM. artbird's i7-9700 iGPU (UHD 630) is BIOS-disabled. --disable-gpu flag would move rendering to software.
  • OOM resilienceqmd embed crashes on CUDA OOM for large chunks but doesn't retry. Each restart picks up where it left off (idempotent), but manual restarts are needed. A retry loop or smaller batch size would help.

Files Changed

apps/qmdctl/qmdctl/run_benchmark_search.py  (new)
apps/qmdctl/qmdctl/cli.py                   (add benchmark-search command)
apps/qmdctl/pyproject.toml                   (add script entry)
apps/qmdctl/CLAUDE.md                        (add subcommand docs)
apps/qmdctl/qmdctl/run_provision_service.py  (env vars + restart fix)
sandcastles.md                               (add QMD model entry)

Benchmark Data Files

  • Before: /tmp/qmd-benchmarks/qmd-benchmark-before-20260320T030240Z.yaml
  • After: /tmp/qmd-benchmarks/qmd-benchmark-after-20260320T065406Z.yaml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment