Some thoughts on the OSCA books

Organization

The "OSCA" concept should be more of a federated collection of books instead of a centrally managed resource. This will enable greater contributions from the Bioconductor community, rather than being restricted to a handful of maintainers. Most importantly, books can be contributed by the same developers who maintain the package(s) used by the book. This avoids a major problem in book maintenance whereby package updates cause unexpected changes to the results in the book.

The flipside of many separate books is that a greater burden is placed on the reader to assemble the disparate bits of knowledge into a functional workflow. For sufficiently modular steps, I think this is a statisfactory trade-off for maintainability. The archetypal example is the SingleR book, which has always been separate from the OSCA books; cell type annotation is clearly a distinct operation that can be plugged into any other workflow.

Validation

In the first iteration, the OSCA books were very concrete in their examples. They would often refer to specific cluster numbers and genes in the code, results and text. This made for very clear explanations but required validation to ensure that the text matched up to the results. When the package was updated, the validation would often fail as the results would change, typically from new cluster IDs. This is probably the biggest pain point in book maintenance.

One solution is to auto-update the text based on the results, e.g., with inline code chunks to empirically report the ID of the relevant cluster. This improves the robustness of the book builds to package updates, though some validation will still be required. Another approach is to simply be less concrete in the text when discussing the results. This sacrifices some reader clarity for easier maintenace.

Recycling

The current implementation of the OSCA books uses a too-clever-by-half caching strategy, whereby objects can be retrieved from the knitr caches created by any of the books. The original idea was to re-use objects from different chapters or books so that we could save time by not repeating the calculations. In practice, this created very complicated dependencies within and between books.

The solution is to stop fiddling around with the cache, just copy the code and re-run it in each chapter. With the new scrapper package, the execution is quick enough that any time savings would not be noticeable.

LTLA/osca-thoughts.md

Some thoughts on the OSCA books

Organization

Validation

Recycling