Changes requested · requirements · 2026-03-19
Well-structured epic with clear scope boundaries, validated user need (6/6 interviews), explicit IN/OUT lists, and sensible phasing. Five blockers around data model readiness, a risky dependency choice, and unaddressed PII policy need resolution before child tasks begin.
-
[Data Model Impact] Epic depends on #74 (Work, Edition, UserCopy) which isn't implemented — current codebase only has placeholder
collections/itemstables. Any import code written now will need destructive rework when #74 lands. Fix: Hard-gate this epic: no child task starts until #74 is merged and schema changes #15–#19 plusWANT_TO_READstatus are present in@bib/db.⚠️ Needs Leo's input -
[Data Model Impact] Five schema changes (#15–#19) are listed as dependencies but have no consolidated migration plan. Applying piecemeal risks partial states (e.g.
import_metadataexists butSentimentTag.sourcedoesn't). Fix: Define a single atomic Drizzle migration covering all five changes plusWANT_TO_READ. For #16 (star_ratinginteger→decimal), use a two-step approach: addstar_rating_decimalcolumn, backfill, drop original in a follow-up migration. Verify SQLiteREALaffinity works with Drizzle's type mapping.⚠️ Needs Leo's input -
[Third-Party Dependencies]
fuzzyset.jsis proposed for fuzzy matching but is effectively unmaintained since 2020, has a bus factor of 1, no TypeScript types, and sits in the critical matching path (determines which books land in a user's library). Fix: Evaluatefuse.js(actively maintained, typed) or implement Jaro-Winkler/trigram similarity directly — the thresholds are already defined, so a small pure-TypeScript implementation is realistic. Either way, wrap behind aBookMatcherinterface in@bib/coreso the algorithm is swappable.⚠️ Needs Leo's input -
[Security] Review text ingested from CSV is user-authored narrative that may contain sensitive content. No policy on whether it's classified as PII, stored encrypted, or excluded from AI enrichment pipelines. Fix: Classify
review_textas PII. Confirm it is excluded from any Claude API enrichment calls (or require explicit user consent). Document retention and access policy before implementation.⚠️ Needs Leo's input -
[Security, Data Model, Cost Model]
import_metadatajsonb stores unmapped CSV fields with no sanitisation, schema, size boundary, or expiry. An attacker can embed arbitrarily large payloads; at scale, unbounded jsonb across 450M rows creates significant storage cost with no feature value after initial reconciliation. Fix: Define a Zod schema in@bib/sharedper source format. Enforce max key count and per-value byte limit. Add a 90-day TTL (purge after successful re-import or expiry). Strip to primitives only — no nested objects.
-
[Architectural Fit, Separation of Concerns, Portability] Background enrichment queue is referenced throughout but no queue infrastructure exists in the current architecture. SQS is named directly, coupling requirements to AWS. Fix: Define an
EnrichmentQueueinterface in@bib/core(enqueue/ack/nack). Name it "enrichment queue" in requirements, with SQS as current implementation choice. Resolve the queue pattern as part of #75/#98 dependency before Task 7.⚠️ Needs Leo's input -
[Data Model Impact, Architectural Fit]
SentimentTagis in the Intelligence layer (shared), butuser_importedsource value blurs the boundary — one user's StoryGraph moods could pollute shared sentiment data if isolation isn't enforced. Fix: Either add a non-nullowner_idFK toSentimentTagforuser_importedrows and exclude from shared queries, or move imported moods toUserCopyTagwithtag_type = 'imported_mood'(cleaner layer separation).⚠️ Needs Leo's input -
[Cost Model] Free tier stores ALL imported books (e.g. 2,000) with zero revenue. At 1.5M users × 300 books,
import_metadataalone is ~1.35TB. Neon bills on storage. Fix: Consider capping stored rows for free tier, or strippingimport_metadataafter a grace period. At minimum, benchmark storage cost per free user against conversion rate assumptions.⚠️ Needs Leo's input -
[Scope & YAGNI] BookBuddy parser is included because of one user's test dataset, not validated segment demand. Zero interviewees use BookBuddy. Fix: Defer BookBuddy to a follow-on epic. Use Kerrie's data converted to Biblii template format for testing. Revisit if users request it.
-
[Scope & YAGNI] Re-import with per-field change detection (Task 8) is a full sub-feature built speculatively. No evidence users will iterate on CSV imports after initial migration. Fix: Descope to dedup-only for v1 (skip existing books, append new ones). Add per-field diffing when users request it.
-
[Separation of Concerns, Architectural Fit] No package mapping for 6–8 child tasks. Without explicit assignment, individual tasks may independently decide where to place code, fragmenting logic across packages. Fix: Add a "Package mapping" section: parsers and matching in
@bib/core, Zod schemas in@bib/shared, HTTP endpoints in@bib/api, repository implementations in@bib/db. -
[Third-Party Dependencies] No CSV parser library is specified. Node has no built-in CSV parser, and naive splitting breaks on quoted fields with embedded commas (common in book data). Fix: Decide on
csv-parseorpapaparsebefore implementation. Wrap behind aCsvReaderinterface. Add this decision to the spike. -
[Security] 10MB file limit is transport-level only. No validation of row count, column count, or per-field length — a file under 10MB can contain pathological input causing memory exhaustion. Fix: Enforce max row count (e.g. 10,000), max columns per row, and max per-field byte length in the parse layer.
-
[Scalability] 30-second Lambda timeout for 2,000-book preview is tight with no headroom for cold starts or slow Neon connections. No recovery path for partial failures during confirm phase. Fix: Benchmark at 500 and 2,000 rows. Add an
import_jobstracking row so confirm can be retried safely, skipping already-committed batches. -
[Data Model, Scalability] No composite indices specified for 450M-row
UserCopytable. Free-tier 50-book query and import dedup both need(owner_id, reading_status). Fix: DefineCREATE INDEXonuser_copies(owner_id, reading_status, collection_status)in the migration. AddEXPLAIN ANALYZEbaseline tests. -
[Data Model]
WANT_TO_READvsTBRdistinction is mentioned but not defined. Import status mapping depends on this being resolved. Fix: Define the fullreading_statusenum before any parser maps CSV values to it.
- Fuzzy matching accuracy needs a golden-file test fixture (30–50 real title/author pairs with expected outcomes) to catch regressions → add to Task 5 acceptance criteria (Testability)
- ISBN checksum validation before catalogue lookup prevents lookup amplification → add ISBN-10/13 validator to
@bib/shared(Security) - Half-star rating may only be needed for StoryGraph — scope to format-specific mapping rather than platform-wide change (Scope & YAGNI)
- Encoding fallback can use built-in
TextDecoderwith'iso-8859-1'label — note in spike to prevent unnecessaryiconv-litedependency (Third-Party Dependencies) - Lambda GB-second cost per import should be benchmarked at target scale to validate margin (Cost Model)
pg_trgmmigration path is Postgres-specific — note that in-memory matching remains the portable baseline (Portability)- Extract format detection as a standalone pure function separate from parser classes for independent testing (Testability)
- Centralise free-tier enforcement in a single query helper rather than scattering limits across endpoints (Scalability)
No lenses returned a clean pass — all 10 found at least suggestions. This is expected for a feature of this scope at requirements stage.