Skip to content

Instantly share code, notes, and snippets.

@aaronschachter
Created April 10, 2026 20:19
Show Gist options
  • Select an option

  • Save aaronschachter/21794308da645e6286df4db9b48a2e74 to your computer and use it in GitHub Desktop.

Select an option

Save aaronschachter/21794308da645e6286df4db9b48a2e74 to your computer and use it in GitHub Desktop.
CMSEXT-1850: Investigation of 'Called with no collections for database' warning during publish

CMSEXT-1850: "Called with no collections for database" Investigation

Ticket: https://webflow.atlassian.net/browse/CMSEXT-1850 Branch: CMSEXT-1850-investigation Last updated: 2026-04-10

Summary

During site publish, the Single-Tenant PG storage layer logs a warning "Called with no collections for database" when insertMany is called with an empty array. This happens when upstream batch filtering removes all items in a batch (archived items, unpublished drafts).

Key Finding: PG-Only

The warning is only logged in the Single-Tenant PG storage layer:

  • entrypoints/server/lib/logic/cms/internal/storageLayers/storageLayerSingleTenantPg/CMSInsertOperations/index.ts:142
  • The other storage layers (storageLayerDefault, storageLayerLocalization, storageLayerExtData) all have insertMany methods but none log a warning when called with an empty array. The same empty-batch scenario likely occurs on those layers too, just silently.

Root Cause (from initial investigation, 2026-04-09)

The warning is misleading. It fires when the publish pipeline calls insertMany([]) after upstream filtering legitimately removes entire batches. There is no guard in prepareItemsForPublish (line 532-537) to skip the call when insertions is empty.

Code path

  1. publishUtil.ts:196-216 - batch callback filters out archived/draft items
  2. publishUtil.ts:483-538 - prepareItemsForPublish groups by collection, filters nulls
  3. publishUtil.ts:532-537 - calls insertMany(insertions) with no empty check
  4. CMSInsertOperations/index.ts:134-147 - groupByCollectionTable returns [], warning fires

Second publish path

publishUtil.ts:255-273 (preserved draft changes) re-publishes items from the live publication. Batches with no matching draftChangesCmsItemKeys also produce publishCMSItems([]).

Scale

2026-04-10 (Datadog, last 24h)

  • 1,862 warnings across 170 distinct databases / 165 distinct sites
  • Publish strategy is mixed: some site, some empty (single-page publish)
  • A few databases show siteId: undefined

Top 20 databases by warning count (24h)

database_id site_id count
69cd59b46b485182422b1176 69cd59b46b485182422b1138 405
69c3f5d79caeca73e326a5a9 69a8527778df147812f39db3 206
621e95f9ac30687a56e4297e 621e6f1effebfe03881da9bd 148
646b7081dd2c8b67db60998e undefined 78
61f7c8145fe6f6e022a84b3c 61f7c8145fe6f608faa84b36 44
69d8c639450cf83749ea8c85 69d8c639450cf83749ea8c70 43
69c6d53f53427db6234ea1dd 69c6d53d53427db6234ea14a 39
67e372dce27edb342559fda0 undefined 38
69b38357dd91b49325a066e4 69b38357dd91b49325a066d8 34
69d52f8695dbb0307c76cc7c 69ca18142eb637b38f123ea5 33
69d75a6e85228ce4377c110d 69cb90973f8f948b3f21c100 31
68a2128d673dce7e5435764e 68a2128d673dce7e54357641 30
68a2128d673dce7e5435764e undefined 30
69d4c401f8743426267fe506 69d4c401f8743426267fe4ee 29
699e89dd0eea6c0a95d8171e 699d3f5e9f3bd19155432990 28
699878abe2fb02ec2cea78f6 699878abe2fb02ec2cea78f4 20
69c296dd569bd9a00927a2d5 69c27b915c44f87f356b4ba9 20
6851647deb0233ad4acfd2bc 6851647deb0233ad4acfd250 18
69ca92da3c9e0a203dd90501 69ad10cec98a6c22015b8155 17
69d4339e4a1f8875ffa84cbe 67ca0a88589e2fffa9a2f421 16

2026-04-09 (OpenSearch, last 24h)

  • 193 distinct database/site combinations affected
  • Top offender was 69cb8a098ee44d1eae714e30 (253 warnings/day), which is no longer in the top 20 today
  • GenieAI database 69a78a937b0c47d9c1b78580 (originally called out in ticket) had zero warnings

Observations

  • The top offender shifts day to day, suggesting it correlates with publish frequency rather than a fixed data problem.
  • Most database IDs start with 69, suggesting recently created databases (ObjectID timestamps). These are likely newly migrated PG sites.
  • Some databases have siteId: undefined, worth investigating.

Deep Dives

Database 69cd59b46b485182422b1176 (top offender, 405 warnings/24h)

  • Site shortname: april-2026-airbyte
  • Site ID: 69cd59b46b485182422b1138
  • Dynamo admin: https://webflow.com/admin/dynamo/69cd59b46b485182422b1176
  • Snapshot: webflow-app-admin-789c45d7c4-nrcxs/405dcde0-61e6-4239-a4c5-55b3f210a3df.v1-lite
  • Collections: 55
  • Storage layer: singleTenantPgSelfServe
  • Publications (April 10): 16 publishes, sometimes minutes apart
  • Total published databases: 18

The site name suggests this is an Airbyte (data integration tool) connected site, which likely explains the high publish frequency (automated/API-driven publishes).

Publish metrics (from Datadog):

  • Every publish reports numCollections: 55, numDocuments: 45,405
  • Exactly 27 warnings per publish, every time (27 x 15 publishes = 405 total/day)
  • All 27 warnings fire at the same timestamp, same host (all from the first publish path)
  • Same publisher (632620133009595dc427d92f), same session, publishing every ~5-20 minutes

Analysis:

  • 45,405 items published in batches of 500 = ~91 batches (minimum)
  • Total staged items (including archived/draft) is higher than 45,405, since the stream includes items that get filtered out
  • 27 batches end up entirely archived/draft after filtering, producing publishCMSItems([])
  • The consistency (always exactly 27) suggests a stable population of archived/draft items that cluster together in natural (insertion) order

Local PG query results (snapshot of production data):

Note: numDocuments: 45,405 in publish logs is the total items streamed (including archived/draft), not items actually written to the publication. See batchReadableStream line 444 which counts all items read from the stream before filtering.

Metric Count %
Total staged items 45,405 100%
Archived 32,946 72.5%
Draft (never published) 481 1.1%
Publishable 12,243 27.0%
Unpublishable (archived + draft) 33,427 73.6%
  • Collections with 0 publishable items: 3 (plus 1 empty collection)
  • Two collections dominate and together account for ~39K of the 45K items:

Collection 69d93f192ff103a9b73c7c71 (Airbyte source-to-destination connector pairs):

Status Count %
Total 23,152 100%
Archived 19,630 84.8%
Draft (never published) 101 0.4%
Publishable 3,522 15.2%

Collection 69d93f192ff103a9b73c7df3 (also connector pairs):

Status Count %
Total 16,234 100%
Archived 12,852 79.1%
Draft (never published) 1 0.0%
Publishable 3,382 20.8%

Why exactly 27 empty batches?

Items are read per-collection via keyset pagination (getAllIterable line 234, batches of 100 by _id), then yielded into batchReadableStream's 500-item batches. When there are long consecutive runs of archived items (by _id order), entire 500-item batches end up fully archived after filtering.

Consecutive runs of unpublishable items (queried from local PG snapshot):

  • Collection ...7c71: one run of 9,075 consecutive unpublishable items = ~18 empty batches
  • Collection ...7df3: one run of 5,383 consecutive unpublishable items = ~10 empty batches
  • Plus 2 fully-unpublishable collections (44 and 25 items) = ~1 empty batch each

18 + 10 + 1 = ~29 estimated, close to the observed 27 (difference due to batch boundary alignment between the 100-item keyset pages and 500-item publish batches).

Conclusion for this database: The warning is not a bug. This is an Airbyte-connected site that bulk-imports data and archives heavily. The publish pipeline reads all staged items (including archived) from PG, transfers them over the wire, converts them to CMSItem objects, and then discards archived/draft ones in the batch callback. For this database, that means 33K items are read from PG and thrown away every publish.

Remaining questions:

  • Is the high publish frequency driven by the Airbyte integration or manual?

Database 621e95f9ac30687a56e4297e (3rd highest, 148 warnings/24h)

  • Site shortname: biorender-marketing-site
  • Site ID: 621e6f1effebfe03881da9bd
  • Dynamo admin: https://webflow.com/admin/dynamo/621e95f9ac30687a56e4297e
  • Snapshot: webflow-app-admin-58df7dcf4-d59vw/7e35c801-c1f3-4daf-a1ac-950071bf0ff0.v1-lite
  • Collections: 33
  • Storage layer: singleTenantPgSelfServe
  • Items streamed per publish: 50,000 (round number, possibly hitting a limit)
  • Warnings per publish: 37
  • Publish frequency: ~daily (7 publishes in 7 days), always from the Designer
  • Database created: early 2022 (older, established site)

Local PG query results (snapshot of production data):

Metric Count %
Total staged items 50,000 100%
Archived 2,271 4.5%
Draft (never published) 19,659 39.3%
Publishable 29,217 58.4%
Unpublishable (archived + draft) 21,930 43.9%
  • Collections with 0 publishable items: 1 (plus 1 empty collection)
  • Dominant collection: ...ea200 (29,295 items, 19,548 draft-never-published / 66.7%). Contains scientific illustration templates (BioRender's core product). Most drafts were never published.

Consecutive runs of unpublishable items:

  • One run of 15,553 = ~31 empty batches
  • One run of 3,995 = ~8 empty batches
  • Estimated ~39, close to observed 37

Conclusion: Different profile from the Airbyte site. BioRender's warnings are driven by draft-never-published items (not archived), concentrated in a single large templates collection. Same root cause: large runs of unpublishable items in _id order producing empty batches after filtering. Confirms the pattern is not site-specific.

Why archived items aren't filtered at the query level

The publish query (publishUtil.ts:126-138) intentionally has no filter, which causes getAllIterable to use keyset pagination instead of the streams path (getAllIterable.ts:111-129). The comment on publishUtil.ts:202-204 explains this choice.

Keyset pagination (WHERE _id > lastId ORDER BY _id LIMIT 100) fetches items in small batches using the primary key index. No long-lived database cursor is held open.

Streams path (query.stream()) holds a database cursor open for the entire read. For large databases this caused connection timeouts (see Omavi's NOVA publish timeout investigation in #triage-hosting-infrastructure, March 2026).

Adding a CMS query filter (e.g., _archived = false) currently triggers the streams path via hasCMSQueryFilter. However, the filter could potentially be added directly to the keyset pagination query (line 248 of getAllIterable.ts) without going through the CMS query filter path, keeping the benefits of keyset pagination while skipping archived items at the PG level.

Possible Approaches

Approach 1: Early return on empty batch

Add a guard in prepareItemsForPublish (publishUtil.ts:532-537) to skip the StorageLayerFactory.create + insertMany call when insertions is empty.

// Before insertMany call:
if (!insertions.length) {
  return [];
}

Pros: Simple one-liner. Eliminates the misleading warning and avoids unnecessary StorageLayerFactory + insertMany overhead.

Cons: The archived/draft items are still read from PG, transferred over the wire, and converted to CMSItem objects before being discarded in the batch callback. For the Airbyte site, that's ~33K items read and thrown away every publish. This fixes the symptom (noisy warning) but not the underlying inefficiency.

Where do we know it's empty? Inside the function returned by prepareItemsForPublish (line 483), after grouping by collection and filtering nulls (line 530). If insertions (the array of CMSInsert objects) has length 0, there's nothing to insert.

Approach 2: Filter archived/draft items at the PG query level

Exclude unpublishable items from the query so they never leave the database. The publish query (publishUtil.ts:126-128) currently has no filter, which allows getAllIterable to use keyset pagination. Two sub-options:

2a) Add filter directly in keyset pagination query builder

In getAllIterable.ts, the keyset pagination path builds a query per collection (line 248). Add .where('f__archived', '!=', true) (and the draft condition) to the paginatedQuery builder, without going through the CMS query filter system. This keeps keyset pagination and avoids triggering the streams path.

Pros: Archived items never leave PG. For the Airbyte site, this skips ~33K rows per publish. No performance concern on the query side since PG would still walk the _id index and filter rows (it already fetches all columns from heap anyway). No standalone index on f__archived exists, but one isn't needed.

Cons: The filter is added outside the normal CMS query filter system, so it's a special case in the pagination code. Also, the _draft filtering is more nuanced (draft items that have been published need to be tracked for draftChangesCmsItemKeys, not skipped entirely), so only _archived can be safely filtered at the query level. Draft-only items are a tiny fraction (481 out of 45K for this site) so this is acceptable.

2b) Adjust hasCMSQueryFilter to allow simple filters through to keyset

Modify hasCMSQueryFilter so that certain "simple" filters (like boolean equality on indexed columns) don't trigger the streams path. Then add _archived = false to the CMS query in publishUtil.ts.

Pros: Uses the standard CMS query filter system. More generalizable.

Cons: More complex change. hasCMSQueryFilter is used across the codebase, so changes there have broader impact. Would need careful testing.

Recommendation

Do both: Approach 1 as an immediate cleanup (eliminates the warning, tiny PR), then Approach 2a as a follow-up for the performance win. Approach 2a only filters _archived at the query level; the draft/publishedOn logic stays in the batch callback since it feeds draftChangesCmsItemKeys.

Dynamo Admin Improvements

Running list of improvements to the /admin/dynamo pages discovered during this investigation. See also existing tickets: CMSEXT-1888, CMSEXT-1824, CMSEXT-566, CMSEXT-481, CMSEXT-1786, CMSEXT-1630.

  • Copy JSON button on published databases list (and other JSON data views) so data can be easily shared/pasted
  • Show fast counts per collection when listing collections in a database (CMSEXT-1630)
  • Indicate items with draft changes in items list
  • Indicate items scheduled to publish in items list (requires query on scheduledSIP)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment