Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save escherize/675a58fb2f840bd1df738dc20468ef54 to your computer and use it in GitHub Desktop.

Select an option

Save escherize/675a58fb2f840bd1df738dc20468ef54 to your computer and use it in GitHub Desktop.
PR #75333 Review: Replace H2 with SQLite for Sample Data

PR #75333 Review: Replace H2 with SQLite for Sample Data

PR: metabase/metabase#75333 Author: @bgrabow Date reviewed: 2026-06-15 Scope: 47 files changed, +1479/-356 Labels: no-backport, PR-Env, .Team/Graphy

Summary

SQLite becomes a core driver for the bundled Sample Database. H2 is removed as the sample DB engine; E2E tests still use H2 via MB_SAMPLE_DATABASE_ENGINE=h2. Major changes:

  • SQLite driver moved from modules/drivers/sqlite/ into core (src/metabase/driver/sqlite.clj)
  • New sqlite-jdbc dependency in deps.edn
  • New metabase.sample-data.example-content namespace recreates examples on upgrade with id remapping
  • New app DB custom migration for the downgrade case
  • API layer: PUT /api/database/:id rejects sample DB edits with 400
  • Frontend: isDbModifiable extended for is_sample; creatable? flag on engine options hides SQLite from hosted engine picker
  • Sample content EDN updated: engine h2->sqlite, schema PUBLIC->nil for all tables
  • Enterprise sandbox test validates doc walkthrough against real SQLite file
  • SQLite/H2 parity test compares row-by-row across all 9 tables

Review Methodology

5 parallel subagents reviewed the PR:

Agent Focus Duration
ClojureStyleReview Style, conventions, docs, naming 2m24s
TSReview TypeScript strictness, any, types, i18n 1m24s
DriversAndSyncReview SQLite driver, sync, creatable?, read-only mode 3m25s
PlatformReview API, migrations, startup flow, sample data lifecycle 2m8s
GeneralQA Logic errors, security, test coverage 2m50s

Findings

Important Issues (4)

Issue 1: Missing :added metadata on new creatable? multimethod

  • src/metabase/driver.clj:~1300
  • The new creatable? defmulti carries only {:arglists '([driver context])}. Every other public multimethod in the file has a version annotation (:added "0.41.0", "0.44.0", etc.). A new public driver API surface should be annotated with {:added "0.63.0"} so driver authors and tooling can track its introduction.

Issue 2: Missing driver-changelog entry for creatable?

  • docs/developers-guide/driver-changelog.md
  • The creatable? multimethod is a public driver API addition affecting engine selection for all drivers. Every other public multimethod has an entry here. Should either add it under a ## Metabase 0.63.0 section or annotate {:changelog-test/ignore true} if intentionally excluded.

Issue 3: Sync-failure recovery in replace-sample-database! leaves inconsistent, untested state

  • src/metabase/sample_data/impl.clj:158-168
  • If sync/sync-database! throws, the catch block only logs and continues to recreate-example-content! on the partially-synced database. If no tables exist, field-id-maps returns empty maps, creating orphaned rows (nil table_ids, nil collection_ids). The DB row with :is_sample true remains half-baked. The test mocks sync to succeed unconditionally — this failure mode is untested. Next startup recovers, but intermediate state is inconsistent. Fix: Skip recreate-example-content! when sync fails, or add a test for this failure mode.

Issue 4: JAR-fallback path removed without replacement test

  • test/metabase/sample_data/impl_test.clj
  • Old code read H2 sample DB from inside the JAR (zip: prefix) when plugins dir was unwritable. Three tests covered this. SQLite requires a real file on disk — the fallback is gone. If extraction throws, the sample DB won't load and there's no recovery test for unwritable plugins dir.

Minor Issues (2)

Issue 5: No startup log when sample DB deleted but content disabled

  • src/metabase/core/core.clj:~217
  • When MB_LOAD_SAMPLE_CONTENT=false and the bundled engine changed, the old H2 sample DB is deleted unconditionally but no new content is created (since load-sample-content? returns false). No log message explains why the sample data disappeared. Fix: Add a startup log telling operators to set MB_LOAD_SAMPLE_CONTENT=true to recreate sample data.

Issue 6: isDbModifiable double-negative is correct but hard to maintain

  • frontend/src/metabase/common/utils/database.ts:9
  • return !(database?.id != null && (database.is_attached_dwh || database.is_sample)) has survived one expansion but is nearing the readability cliff.

Confirmed Correct (no issues found)

Area Files Verdict
Read-only open mode sqlite.clj:107-112 True kernel-level SQLITE_OPEN_READONLY, not advisory. Connection cannot be upgraded to read-write.
SQLite FDW hardening sqlite.clj:112 limit_attached 0 correctly disables ATTACH DATABASE security surface.
*allow-testing-sqlite-connections* bindings settings.clj + 5 call sites All required sites covered (sync, refingerprint, schema refresh, unhide/resync, test connection). No missing sites.
creatable? / available? separation driver.clj, util.clj Clean semantic split: available? = usable for internal ops; creatable? = shown in engine picker.
Engine visibility logic engine.ts:28 isSelected || ((!isSuperseded || isSuperseding) && isCreatable) correct for all cases.
Upgrade/downgrade migration Both tests H2→SQLite (impl_test.clj:99) and SQLite→H2 (custom_migrations_test.clj:2938) both covered. Downgrade test is particularly thorough (MySQL 9.7 cascade workaround verified).
Example content recreation example_content.clj Two-phase card insertion resolves source_card_id circular deps; per-segment collection location remapping avoids double-substitution; full transaction rollback.
H2→SQLite converter dev/.../sample_database_convert.clj CLOB, date coercion, FK ordering all correct. No SQL injection vector.
Sandbox enterprise test enterprise/.../sample_database_test.clj Faithfully mirrors the doc walkthrough end-to-end against real bundled SQLite file.
API sample DB guard warehouses_rest/api.clj PUT-only guard is sufficient (no PATCH/DELETE for databases). 400 is the correct status code.
TS/JS code All frontend files No new any or typing issues. Pre-existing any marks not introduced by this PR.
Module linter config .clj-kondo/config/modules/config.edn Matches actual namespace usage.
supported-in-environment? removal util.clj Only caller was available-drivers; replaced by creatable? at the engine-list level.
Path regex change impl_test.clj:52 Dropped ^ anchor is innocuous — re-matches implicitly anchors at both ends.

Architecture Decisions

Read-only enforcement strategy

SQLite sample DB connections use kernel-level SQLITE_OPEN_READONLY via SQLiteConfig.setReadOnly(true). This is enforced at the OS level, not advisory — the connection handle rejects writes, and cannot be upgraded to read-write after open. The Niro tool disclosure demonstrated a bypass (creating a second non-sample DB aliasing the same file with a writable connection), but this was acknowledged as UX protection, not a security boundary — writes vanish on restart when the bundled artifact is re-extracted.

creatable? vs available?

A multimethod creatable? was added to driver.clj, accepting a context map ({:hosted? <boolean>}). SQLite implements creatable? as (not hosted?). This is clean separation from available?:

  • available? controls whether the driver is registered and usable for internal operations
  • creatable? controls whether it appears in the "add database" engine picker

The old approach filtered SQLite out of available-drivers entirely via supported-in-environment? in util.clj, which broke internal use on hosted. The new approach always includes SQLite in available-drivers and lets the UI layer enforce the "not creatable" restriction.

H2→SQLite upgrade vs SQLite→H2 downgrade asymmetry

  • Upgrade (H2→SQLite): handled at startup in metabase.sample-data.impl/replace-sample-database!. Deletes the old H2 sample DB, cascading to its cards/dashcards; prunes emptied dashboards; removes old Example collections; creates new SQLite DB; syncs; recreates example content. Runs outside Liquibase — it's reactive startup logic triggered by engine mismatch.
  • Downgrade (SQLite→H2): handled in a Liquibase custom migration (MigrateAwayFromSqliteSampleDatabaseOnDowngrade). Deletes the SQLite sample DB bottom-up to avoid MySQL 9.7 cascade bugs. Does NOT restore an H2 sample DB — the older H2 version has its own startup logic that re-extracts the H2 sample DB as part of fresh-install seeding.

Example content id remapping

recreate-example-content! in example_content.clj inserts EDN-stored rows into populated tables, remapping every id reference:

  • Simple FK columns via entity_id lookup (UNIQUE constraint guarantees match)
  • Database/Table/Field/Card ids embedded in card and dashcard query blobs via serdes export walkers
  • Collection location paths remapped per-segment to avoid double-substitution
  • Two-phase card insertion resolves circular source_card_id references

Test Coverage Assessment

Strengths:

  • Upgrade (H2→SQLite) and downgrade (SQLite→H2) both have dedicated tests with realistic fixtures
  • Parity test compares all 9 tables row-by-row with type normalization
  • Sandbox enterprise test runs against the real bundled SQLite file
  • API rejection test (400 + test-endpoint bypass) present
  • Engine visibility test covers both hidden and selected cases

Gaps:

  • Sync-failure recovery in replace-sample-database! is untested
  • Unwritable plugins dir failure mode has no test (was removed with H2 fallback)
  • No test for circular engine-change cycle (H2→SQLite→H2→SQLite)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment