Ticket: https://webflow.atlassian.net/browse/CMSEXT-1850
Branch: CMSEXT-1850-investigation
Last updated: 2026-04-10
During site publish, the Single-Tenant PG storage layer logs a warning "Called with no collections for database" when insertMany is called with an empty array. This happens when upstream batch filtering removes all items in a batch (archived items, unpublished drafts).
The warning is only logged in the Single-Tenant PG storage layer:
entrypoints/server/lib/logic/cms/internal/storageLayers/storageLayerSingleTenantPg/CMSInsertOperations/index.ts:142- The other storage layers (
storageLayerDefault,storageLayerLocalization,storageLayerExtData) all haveinsertManymethods but none log a warning when called with an empty array. The same empty-batch scenario likely occurs on those layers too, just silently.
The warning is misleading. It fires when the publish pipeline calls insertMany([]) after upstream filtering legitimately removes entire batches. There is no guard in prepareItemsForPublish (line 532-537) to skip the call when insertions is empty.
publishUtil.ts:196-216- batch callback filters out archived/draft itemspublishUtil.ts:483-538-prepareItemsForPublishgroups by collection, filters nullspublishUtil.ts:532-537- callsinsertMany(insertions)with no empty checkCMSInsertOperations/index.ts:134-147-groupByCollectionTablereturns[], warning fires
publishUtil.ts:255-273 (preserved draft changes) re-publishes items from the live publication. Batches with no matching draftChangesCmsItemKeys also produce publishCMSItems([]).
- 1,862 warnings across 170 distinct databases / 165 distinct sites
- Publish strategy is mixed: some
site, some empty (single-page publish) - A few databases show
siteId: undefined
| database_id | site_id | count |
|---|---|---|
| 69cd59b46b485182422b1176 | 69cd59b46b485182422b1138 | 405 |
| 69c3f5d79caeca73e326a5a9 | 69a8527778df147812f39db3 | 206 |
| 621e95f9ac30687a56e4297e | 621e6f1effebfe03881da9bd | 148 |
| 646b7081dd2c8b67db60998e | undefined | 78 |
| 61f7c8145fe6f6e022a84b3c | 61f7c8145fe6f608faa84b36 | 44 |
| 69d8c639450cf83749ea8c85 | 69d8c639450cf83749ea8c70 | 43 |
| 69c6d53f53427db6234ea1dd | 69c6d53d53427db6234ea14a | 39 |
| 67e372dce27edb342559fda0 | undefined | 38 |
| 69b38357dd91b49325a066e4 | 69b38357dd91b49325a066d8 | 34 |
| 69d52f8695dbb0307c76cc7c | 69ca18142eb637b38f123ea5 | 33 |
| 69d75a6e85228ce4377c110d | 69cb90973f8f948b3f21c100 | 31 |
| 68a2128d673dce7e5435764e | 68a2128d673dce7e54357641 | 30 |
| 68a2128d673dce7e5435764e | undefined | 30 |
| 69d4c401f8743426267fe506 | 69d4c401f8743426267fe4ee | 29 |
| 699e89dd0eea6c0a95d8171e | 699d3f5e9f3bd19155432990 | 28 |
| 699878abe2fb02ec2cea78f6 | 699878abe2fb02ec2cea78f4 | 20 |
| 69c296dd569bd9a00927a2d5 | 69c27b915c44f87f356b4ba9 | 20 |
| 6851647deb0233ad4acfd2bc | 6851647deb0233ad4acfd250 | 18 |
| 69ca92da3c9e0a203dd90501 | 69ad10cec98a6c22015b8155 | 17 |
| 69d4339e4a1f8875ffa84cbe | 67ca0a88589e2fffa9a2f421 | 16 |
- 193 distinct database/site combinations affected
- Top offender was
69cb8a098ee44d1eae714e30(253 warnings/day), which is no longer in the top 20 today - GenieAI database
69a78a937b0c47d9c1b78580(originally called out in ticket) had zero warnings
- The top offender shifts day to day, suggesting it correlates with publish frequency rather than a fixed data problem.
- Most database IDs start with
69, suggesting recently created databases (ObjectID timestamps). These are likely newly migrated PG sites. - Some databases have
siteId: undefined, worth investigating.
- Site shortname:
april-2026-airbyte - Site ID:
69cd59b46b485182422b1138 - Dynamo admin: https://webflow.com/admin/dynamo/69cd59b46b485182422b1176
- Snapshot:
webflow-app-admin-789c45d7c4-nrcxs/405dcde0-61e6-4239-a4c5-55b3f210a3df.v1-lite - Collections: 55
- Storage layer:
singleTenantPgSelfServe - Publications (April 10): 16 publishes, sometimes minutes apart
- Total published databases: 18
The site name suggests this is an Airbyte (data integration tool) connected site, which likely explains the high publish frequency (automated/API-driven publishes).
Publish metrics (from Datadog):
- Every publish reports
numCollections: 55, numDocuments: 45,405 - Exactly 27 warnings per publish, every time (27 x 15 publishes = 405 total/day)
- All 27 warnings fire at the same timestamp, same host (all from the first publish path)
- Same publisher (
632620133009595dc427d92f), same session, publishing every ~5-20 minutes
Analysis:
- 45,405 items published in batches of 500 = ~91 batches (minimum)
- Total staged items (including archived/draft) is higher than 45,405, since the stream includes items that get filtered out
- 27 batches end up entirely archived/draft after filtering, producing
publishCMSItems([]) - The consistency (always exactly 27) suggests a stable population of archived/draft items that cluster together in natural (insertion) order
Local PG query results (snapshot of production data):
Note: numDocuments: 45,405 in publish logs is the total items streamed (including archived/draft), not items actually written to the publication. See batchReadableStream line 444 which counts all items read from the stream before filtering.
| Metric | Count | % |
|---|---|---|
| Total staged items | 45,405 | 100% |
| Archived | 32,946 | 72.5% |
| Draft (never published) | 481 | 1.1% |
| Publishable | 12,243 | 27.0% |
| Unpublishable (archived + draft) | 33,427 | 73.6% |
- Collections with 0 publishable items: 3 (plus 1 empty collection)
- Two collections dominate and together account for ~39K of the 45K items:
Collection 69d93f192ff103a9b73c7c71 (Airbyte source-to-destination connector pairs):
| Status | Count | % |
|---|---|---|
| Total | 23,152 | 100% |
| Archived | 19,630 | 84.8% |
| Draft (never published) | 101 | 0.4% |
| Publishable | 3,522 | 15.2% |
Collection 69d93f192ff103a9b73c7df3 (also connector pairs):
| Status | Count | % |
|---|---|---|
| Total | 16,234 | 100% |
| Archived | 12,852 | 79.1% |
| Draft (never published) | 1 | 0.0% |
| Publishable | 3,382 | 20.8% |
Why exactly 27 empty batches?
Items are read per-collection via keyset pagination (getAllIterable line 234, batches of 100 by _id), then yielded into batchReadableStream's 500-item batches. When there are long consecutive runs of archived items (by _id order), entire 500-item batches end up fully archived after filtering.
Consecutive runs of unpublishable items (queried from local PG snapshot):
- Collection
...7c71: one run of 9,075 consecutive unpublishable items = ~18 empty batches - Collection
...7df3: one run of 5,383 consecutive unpublishable items = ~10 empty batches - Plus 2 fully-unpublishable collections (44 and 25 items) = ~1 empty batch each
18 + 10 + 1 = ~29 estimated, close to the observed 27 (difference due to batch boundary alignment between the 100-item keyset pages and 500-item publish batches).
Conclusion for this database: The warning is not a bug. This is an Airbyte-connected site that bulk-imports data and archives heavily. The publish pipeline reads all staged items (including archived) from PG, transfers them over the wire, converts them to CMSItem objects, and then discards archived/draft ones in the batch callback. For this database, that means 33K items are read from PG and thrown away every publish.
Remaining questions:
- Is the high publish frequency driven by the Airbyte integration or manual?
- Site shortname:
biorender-marketing-site - Site ID:
621e6f1effebfe03881da9bd - Dynamo admin: https://webflow.com/admin/dynamo/621e95f9ac30687a56e4297e
- Snapshot:
webflow-app-admin-58df7dcf4-d59vw/7e35c801-c1f3-4daf-a1ac-950071bf0ff0.v1-lite - Collections: 33
- Storage layer:
singleTenantPgSelfServe - Items streamed per publish: 50,000 (round number, possibly hitting a limit)
- Warnings per publish: 37
- Publish frequency: ~daily (7 publishes in 7 days), always from the Designer
- Database created: early 2022 (older, established site)
Local PG query results (snapshot of production data):
| Metric | Count | % |
|---|---|---|
| Total staged items | 50,000 | 100% |
| Archived | 2,271 | 4.5% |
| Draft (never published) | 19,659 | 39.3% |
| Publishable | 29,217 | 58.4% |
| Unpublishable (archived + draft) | 21,930 | 43.9% |
- Collections with 0 publishable items: 1 (plus 1 empty collection)
- Dominant collection:
...ea200(29,295 items, 19,548 draft-never-published / 66.7%). Contains scientific illustration templates (BioRender's core product). Most drafts were never published.
Consecutive runs of unpublishable items:
- One run of 15,553 = ~31 empty batches
- One run of 3,995 = ~8 empty batches
- Estimated ~39, close to observed 37
Conclusion: Different profile from the Airbyte site. BioRender's warnings are driven by draft-never-published items (not archived), concentrated in a single large templates collection. Same root cause: large runs of unpublishable items in _id order producing empty batches after filtering. Confirms the pattern is not site-specific.
The publish query (publishUtil.ts:126-138) intentionally has no filter, which causes getAllIterable to use keyset pagination instead of the streams path (getAllIterable.ts:111-129). The comment on publishUtil.ts:202-204 explains this choice.
Keyset pagination (WHERE _id > lastId ORDER BY _id LIMIT 100) fetches items in small batches using the primary key index. No long-lived database cursor is held open.
Streams path (query.stream()) holds a database cursor open for the entire read. For large databases this caused connection timeouts (see Omavi's NOVA publish timeout investigation in #triage-hosting-infrastructure, March 2026).
Adding a CMS query filter (e.g., _archived = false) currently triggers the streams path via hasCMSQueryFilter. However, the filter could potentially be added directly to the keyset pagination query (line 248 of getAllIterable.ts) without going through the CMS query filter path, keeping the benefits of keyset pagination while skipping archived items at the PG level.
Add a guard in prepareItemsForPublish (publishUtil.ts:532-537) to skip the StorageLayerFactory.create + insertMany call when insertions is empty.
// Before insertMany call:
if (!insertions.length) {
return [];
}Pros: Simple one-liner. Eliminates the misleading warning and avoids unnecessary StorageLayerFactory + insertMany overhead.
Cons: The archived/draft items are still read from PG, transferred over the wire, and converted to CMSItem objects before being discarded in the batch callback. For the Airbyte site, that's ~33K items read and thrown away every publish. This fixes the symptom (noisy warning) but not the underlying inefficiency.
Where do we know it's empty? Inside the function returned by prepareItemsForPublish (line 483), after grouping by collection and filtering nulls (line 530). If insertions (the array of CMSInsert objects) has length 0, there's nothing to insert.
Exclude unpublishable items from the query so they never leave the database. The publish query (publishUtil.ts:126-128) currently has no filter, which allows getAllIterable to use keyset pagination. Two sub-options:
2a) Add filter directly in keyset pagination query builder
In getAllIterable.ts, the keyset pagination path builds a query per collection (line 248). Add .where('f__archived', '!=', true) (and the draft condition) to the paginatedQuery builder, without going through the CMS query filter system. This keeps keyset pagination and avoids triggering the streams path.
Pros: Archived items never leave PG. For the Airbyte site, this skips ~33K rows per publish. No performance concern on the query side since PG would still walk the _id index and filter rows (it already fetches all columns from heap anyway). No standalone index on f__archived exists, but one isn't needed.
Cons: The filter is added outside the normal CMS query filter system, so it's a special case in the pagination code. Also, the _draft filtering is more nuanced (draft items that have been published need to be tracked for draftChangesCmsItemKeys, not skipped entirely), so only _archived can be safely filtered at the query level. Draft-only items are a tiny fraction (481 out of 45K for this site) so this is acceptable.
2b) Adjust hasCMSQueryFilter to allow simple filters through to keyset
Modify hasCMSQueryFilter so that certain "simple" filters (like boolean equality on indexed columns) don't trigger the streams path. Then add _archived = false to the CMS query in publishUtil.ts.
Pros: Uses the standard CMS query filter system. More generalizable.
Cons: More complex change. hasCMSQueryFilter is used across the codebase, so changes there have broader impact. Would need careful testing.
Do both: Approach 1 as an immediate cleanup (eliminates the warning, tiny PR), then Approach 2a as a follow-up for the performance win. Approach 2a only filters _archived at the query level; the draft/publishedOn logic stays in the batch callback since it feeds draftChangesCmsItemKeys.
Running list of improvements to the /admin/dynamo pages discovered during this investigation. See also existing tickets: CMSEXT-1888, CMSEXT-1824, CMSEXT-566, CMSEXT-481, CMSEXT-1786, CMSEXT-1630.
- Copy JSON button on published databases list (and other JSON data views) so data can be easily shared/pasted
- Show fast counts per collection when listing collections in a database (CMSEXT-1630)
- Indicate items with draft changes in items list
- Indicate items scheduled to publish in items list (requires query on
scheduledSIP)