CMSEXT-1850: "Called with no collections for database" Investigation

Ticket: https://webflow.atlassian.net/browse/CMSEXT-1850 Branch: CMSEXT-1850-investigation Last updated: 2026-04-10

Summary

During site publish, the Single-Tenant PG storage layer logs a warning "Called with no collections for database" when insertMany is called with an empty array. This happens when upstream batch filtering removes all items in a batch (archived items, unpublished drafts).

Key Finding: PG-Only

The warning is only logged in the Single-Tenant PG storage layer:

entrypoints/server/lib/logic/cms/internal/storageLayers/storageLayerSingleTenantPg/CMSInsertOperations/index.ts:142
The other storage layers (storageLayerDefault, storageLayerLocalization, storageLayerExtData) all have insertMany methods but none log a warning when called with an empty array. The same empty-batch scenario likely occurs on those layers too, just silently.

Root Cause (from initial investigation, 2026-04-09)

The warning is misleading. It fires when the publish pipeline calls insertMany([]) after upstream filtering legitimately removes entire batches. There is no guard in prepareItemsForPublish (line 532-537) to skip the call when insertions is empty.

Code path

publishUtil.ts:196-216 - batch callback filters out archived/draft items
publishUtil.ts:483-538 - prepareItemsForPublish groups by collection, filters nulls
publishUtil.ts:532-537 - calls insertMany(insertions) with no empty check
CMSInsertOperations/index.ts:134-147 - groupByCollectionTable returns [], warning fires

Second publish path

publishUtil.ts:255-273 (preserved draft changes) re-publishes items from the live publication. Batches with no matching draftChangesCmsItemKeys also produce publishCMSItems([]).

Scale

2026-04-10 (Datadog, last 24h)

1,862 warnings across 170 distinct databases / 165 distinct sites
Publish strategy is mixed: some site, some empty (single-page publish)
A few databases show siteId: undefined

Top 20 databases by warning count (24h)

database_id	site_id	count
69cd59b46b485182422b1176	69cd59b46b485182422b1138	405
69c3f5d79caeca73e326a5a9	69a8527778df147812f39db3	206
621e95f9ac30687a56e4297e	621e6f1effebfe03881da9bd	148
646b7081dd2c8b67db60998e	undefined	78
61f7c8145fe6f6e022a84b3c	61f7c8145fe6f608faa84b36	44
69d8c639450cf83749ea8c85	69d8c639450cf83749ea8c70	43
69c6d53f53427db6234ea1dd	69c6d53d53427db6234ea14a	39
67e372dce27edb342559fda0	undefined	38
69b38357dd91b49325a066e4	69b38357dd91b49325a066d8	34
69d52f8695dbb0307c76cc7c	69ca18142eb637b38f123ea5	33
69d75a6e85228ce4377c110d	69cb90973f8f948b3f21c100	31
68a2128d673dce7e5435764e	68a2128d673dce7e54357641	30
68a2128d673dce7e5435764e	undefined	30
69d4c401f8743426267fe506	69d4c401f8743426267fe4ee	29
699e89dd0eea6c0a95d8171e	699d3f5e9f3bd19155432990	28
699878abe2fb02ec2cea78f6	699878abe2fb02ec2cea78f4	20
69c296dd569bd9a00927a2d5	69c27b915c44f87f356b4ba9	20
6851647deb0233ad4acfd2bc	6851647deb0233ad4acfd250	18
69ca92da3c9e0a203dd90501	69ad10cec98a6c22015b8155	17
69d4339e4a1f8875ffa84cbe	67ca0a88589e2fffa9a2f421	16

2026-04-09 (OpenSearch, last 24h)

193 distinct database/site combinations affected
Top offender was 69cb8a098ee44d1eae714e30 (253 warnings/day), which is no longer in the top 20 today
GenieAI database 69a78a937b0c47d9c1b78580 (originally called out in ticket) had zero warnings

Observations

The top offender shifts day to day, suggesting it correlates with publish frequency rather than a fixed data problem.
Most database IDs start with 69, suggesting recently created databases (ObjectID timestamps). These are likely newly migrated PG sites.
Some databases have siteId: undefined, worth investigating.

Deep Dives

Database `69cd59b46b485182422b1176` (top offender, 405 warnings/24h)

Site shortname: april-2026-airbyte
Site ID: 69cd59b46b485182422b1138
Dynamo admin: https://webflow.com/admin/dynamo/69cd59b46b485182422b1176
Snapshot: webflow-app-admin-789c45d7c4-nrcxs/405dcde0-61e6-4239-a4c5-55b3f210a3df.v1-lite
Collections: 55
Storage layer: singleTenantPgSelfServe
Publications (April 10): 16 publishes, sometimes minutes apart
Total published databases: 18

The site name suggests this is an Airbyte (data integration tool) connected site, which likely explains the high publish frequency (automated/API-driven publishes).

Publish metrics (from Datadog):

Every publish reports numCollections: 55, numDocuments: 45,405
Exactly 27 warnings per publish, every time (27 x 15 publishes = 405 total/day)
All 27 warnings fire at the same timestamp, same host (all from the first publish path)
Same publisher (632620133009595dc427d92f), same session, publishing every ~5-20 minutes

Analysis:

45,405 items published in batches of 500 = ~91 batches (minimum)
Total staged items (including archived/draft) is higher than 45,405, since the stream includes items that get filtered out
27 batches end up entirely archived/draft after filtering, producing publishCMSItems([])
The consistency (always exactly 27) suggests a stable population of archived/draft items that cluster together in natural (insertion) order

Local PG query results (snapshot of production data):

Note: numDocuments: 45,405 in publish logs is the total items streamed (including archived/draft), not items actually written to the publication. See batchReadableStream line 444 which counts all items read from the stream before filtering.

Metric	Count	%
Total staged items	45,405	100%
Archived	32,946	72.5%
Draft (never published)	481	1.1%
Publishable	12,243	27.0%
Unpublishable (archived + draft)	33,427	73.6%

Collections with 0 publishable items: 3 (plus 1 empty collection)
Two collections dominate and together account for ~39K of the 45K items:

Collection 69d93f192ff103a9b73c7c71 (Airbyte source-to-destination connector pairs):

Status	Count	%
Total	23,152	100%
Archived	19,630	84.8%
Draft (never published)	101	0.4%
Publishable	3,522	15.2%

Collection 69d93f192ff103a9b73c7df3 (also connector pairs):

Status	Count	%
Total	16,234	100%
Archived	12,852	79.1%
Draft (never published)	1	0.0%
Publishable	3,382	20.8%

Why exactly 27 empty batches?

Items are read per-collection via keyset pagination (getAllIterable line 234, batches of 100 by _id), then yielded into batchReadableStream's 500-item batches. When there are long consecutive runs of archived items (by _id order), entire 500-item batches end up fully archived after filtering.

Consecutive runs of unpublishable items (queried from local PG snapshot):

Collection ...7c71: one run of 9,075 consecutive unpublishable items = ~18 empty batches
Collection ...7df3: one run of 5,383 consecutive unpublishable items = ~10 empty batches
Plus 2 fully-unpublishable collections (44 and 25 items) = ~1 empty batch each

18 + 10 + 1 = ~29 estimated, close to the observed 27 (difference due to batch boundary alignment between the 100-item keyset pages and 500-item publish batches).

Conclusion for this database: The warning is not a bug. This is an Airbyte-connected site that bulk-imports data and archives heavily. The publish pipeline reads all staged items (including archived) from PG, transfers them over the wire, converts them to CMSItem objects, and then discards archived/draft ones in the batch callback. For this database, that means 33K items are read from PG and thrown away every publish.

Remaining questions:

Is the high publish frequency driven by the Airbyte integration or manual?

Database `621e95f9ac30687a56e4297e` (3rd highest, 148 warnings/24h)

Site shortname: biorender-marketing-site
Site ID: 621e6f1effebfe03881da9bd
Dynamo admin: https://webflow.com/admin/dynamo/621e95f9ac30687a56e4297e
Snapshot: webflow-app-admin-58df7dcf4-d59vw/7e35c801-c1f3-4daf-a1ac-950071bf0ff0.v1-lite
Collections: 33
Storage layer: singleTenantPgSelfServe
Items streamed per publish: 50,000 (round number, possibly hitting a limit)
Warnings per publish: 37
Publish frequency: ~daily (7 publishes in 7 days), always from the Designer
Database created: early 2022 (older, established site)

Local PG query results (snapshot of production data):

Metric	Count	%
Total staged items	50,000	100%
Archived	2,271	4.5%
Draft (never published)	19,659	39.3%
Publishable	29,217	58.4%
Unpublishable (archived + draft)	21,930	43.9%

Collections with 0 publishable items: 1 (plus 1 empty collection)
Dominant collection: ...ea200 (29,295 items, 19,548 draft-never-published / 66.7%). Contains scientific illustration templates (BioRender's core product). Most drafts were never published.

Consecutive runs of unpublishable items:

One run of 15,553 = ~31 empty batches
One run of 3,995 = ~8 empty batches
Estimated ~39, close to observed 37

Conclusion: Different profile from the Airbyte site. BioRender's warnings are driven by draft-never-published items (not archived), concentrated in a single large templates collection. Same root cause: large runs of unpublishable items in _id order producing empty batches after filtering. Confirms the pattern is not site-specific.

Why archived items aren't filtered at the query level

The publish query (publishUtil.ts:126-138) intentionally has no filter, which causes getAllIterable to use keyset pagination instead of the streams path (getAllIterable.ts:111-129). The comment on publishUtil.ts:202-204 explains this choice.

Keyset pagination (WHERE _id > lastId ORDER BY _id LIMIT 100) fetches items in small batches using the primary key index. No long-lived database cursor is held open.

Streams path (query.stream()) holds a database cursor open for the entire read. For large databases this caused connection timeouts (see Omavi's NOVA publish timeout investigation in #triage-hosting-infrastructure, March 2026).

Adding a CMS query filter (e.g., _archived = false) currently triggers the streams path via hasCMSQueryFilter. However, the filter could potentially be added directly to the keyset pagination query (line 248 of getAllIterable.ts) without going through the CMS query filter path, keeping the benefits of keyset pagination while skipping archived items at the PG level.

Possible Approaches

Approach 1: Early return on empty batch

Add a guard in prepareItemsForPublish (publishUtil.ts:532-537) to skip the StorageLayerFactory.create + insertMany call when insertions is empty.

// Before insertMany call:
if (!insertions.length) {
  return [];
}

Pros: Simple one-liner. Eliminates the misleading warning and avoids unnecessary StorageLayerFactory + insertMany overhead.

Cons: The archived/draft items are still read from PG, transferred over the wire, and converted to CMSItem objects before being discarded in the batch callback. For the Airbyte site, that's ~33K items read and thrown away every publish. This fixes the symptom (noisy warning) but not the underlying inefficiency.

Where do we know it's empty? Inside the function returned by prepareItemsForPublish (line 483), after grouping by collection and filtering nulls (line 530). If insertions (the array of CMSInsert objects) has length 0, there's nothing to insert.

Approach 2: Filter archived/draft items at the PG query level

Exclude unpublishable items from the query so they never leave the database. The publish query (publishUtil.ts:126-128) currently has no filter, which allows getAllIterable to use keyset pagination. Two sub-options:

2a) Add filter directly in keyset pagination query builder

In getAllIterable.ts, the keyset pagination path builds a query per collection (line 248). Add .where('f__archived', '!=', true) (and the draft condition) to the paginatedQuery builder, without going through the CMS query filter system. This keeps keyset pagination and avoids triggering the streams path.

Pros: Archived items never leave PG. For the Airbyte site, this skips ~33K rows per publish. No performance concern on the query side since PG would still walk the _id index and filter rows (it already fetches all columns from heap anyway). No standalone index on f__archived exists, but one isn't needed.

Cons: The filter is added outside the normal CMS query filter system, so it's a special case in the pagination code. Also, the _draft filtering is more nuanced (draft items that have been published need to be tracked for draftChangesCmsItemKeys, not skipped entirely), so only _archived can be safely filtered at the query level. Draft-only items are a tiny fraction (481 out of 45K for this site) so this is acceptable.

2b) Adjust hasCMSQueryFilter to allow simple filters through to keyset

Modify hasCMSQueryFilter so that certain "simple" filters (like boolean equality on indexed columns) don't trigger the streams path. Then add _archived = false to the CMS query in publishUtil.ts.

Pros: Uses the standard CMS query filter system. More generalizable.

Cons: More complex change. hasCMSQueryFilter is used across the codebase, so changes there have broader impact. Would need careful testing.

Recommendation

Do both: Approach 1 as an immediate cleanup (eliminates the warning, tiny PR), then Approach 2a as a follow-up for the performance win. Approach 2a only filters _archived at the query level; the draft/publishedOn logic stays in the batch callback since it feeds draftChangesCmsItemKeys.

Dynamo Admin Improvements

Running list of improvements to the /admin/dynamo pages discovered during this investigation. See also existing tickets: CMSEXT-1888, CMSEXT-1824, CMSEXT-566, CMSEXT-481, CMSEXT-1786, CMSEXT-1630.

Copy JSON button on published databases list (and other JSON data views) so data can be easily shared/pasted
Show fast counts per collection when listing collections in a database (CMSEXT-1630)
Indicate items with draft changes in items list
Indicate items scheduled to publish in items list (requires query on scheduledSIP)

aaronschachter/CMSEXT-1850-investigation.md

Select an option

No results found

Select an option

No results found

CMSEXT-1850: "Called with no collections for database" Investigation

Summary

Key Finding: PG-Only

Root Cause (from initial investigation, 2026-04-09)

Code path

Second publish path

Scale

2026-04-10 (Datadog, last 24h)

Top 20 databases by warning count (24h)

2026-04-09 (OpenSearch, last 24h)

Observations

Deep Dives

Database `69cd59b46b485182422b1176` (top offender, 405 warnings/24h)

Database `621e95f9ac30687a56e4297e` (3rd highest, 148 warnings/24h)

Why archived items aren't filtered at the query level

Possible Approaches

Approach 1: Early return on empty batch

Approach 2: Filter archived/draft items at the PG query level

Recommendation

Dynamo Admin Improvements

aaronschachter/CMSEXT-1850-investigation.md

CMSEXT-1850: "Called with no collections for database" Investigation

Summary

Key Finding: PG-Only

Root Cause (from initial investigation, 2026-04-09)

Code path

Second publish path

Scale

2026-04-10 (Datadog, last 24h)

Top 20 databases by warning count (24h)

2026-04-09 (OpenSearch, last 24h)

Observations

Deep Dives

Database 69cd59b46b485182422b1176 (top offender, 405 warnings/24h)

Database 621e95f9ac30687a56e4297e (3rd highest, 148 warnings/24h)

Why archived items aren't filtered at the query level

Possible Approaches

Approach 1: Early return on empty batch

Approach 2: Filter archived/draft items at the PG query level

Recommendation

Dynamo Admin Improvements

Database `69cd59b46b485182422b1176` (top offender, 405 warnings/24h)

Database `621e95f9ac30687a56e4297e` (3rd highest, 148 warnings/24h)