Skip to content

Instantly share code, notes, and snippets.

@mmulich
Last active November 6, 2015 15:51
Show Gist options
  • Save mmulich/6be96adc2a47698b7a66 to your computer and use it in GitHub Desktop.
Save mmulich/6be96adc2a47698b7a66 to your computer and use it in GitHub Desktop.

Phase One (codename: Cement Shoes)

This phase will involve setting up the foundation for transforms.

Tasks

  • Each dev should set up an instance of flower, which will help visualize/debug task processing.

  • Initialize Sphinx documentation for cnx-epub & cnx-transforms. Document the purpose and basic functionality of the two packages. Later you can document the parts you are working on and touching. Everything else can progressively be added over time.

  • Move cnx-port to cnx-publishing-builds and rename it accordingly.

    • Initialize sphinx documentation and document purpose and basic functionality.
    • Modify cnx-publishing-builds to be an isolated component that we pull into publishing via the Configurator.include method (see also, Extending an existing Pyramid application). This will help to scope the exports implementation without mucking up the publishing scope.
    • Include cnx-publishing-builds within cnx-publishing.
    • Persist build information for later status retrieval.
  • Rewrite existing roadrunners logic within cnx-transforms.

    1. Pick a runner (e.g. roadrunners.legacy.make_completezip)
    2. Move its' tests as well.
    3. Rewrite it in cnx-transforms. This involves removing references to pybit.BuildRequest, which is passed into each runner as build_request. If the runner requires access to the information within the build_request, you will now get this information from the epub (an instance of cnxepub.epub.EPUB) passed to all cnx-transforms task.
    4. Ensure the tests pass.
    5. Pick another runner and repeat until all have been tranlated.
  • Attach persisted transform build information to /contents/{ident-hash}/extras.

    1. Copy/Move the /contents/{ident-hash}/extras route and view from cnx-archive. (Slight modifications will be necessary to make this work with pyramid. Checkout the promised-land* branch(s) to find the already translated version of the logic.)
    2. Modify the response to include build status. This involves the use of celery to inspect task's status. First check if the file is on the filesystem before checking for a build task.
    3. Ensure 100% test coverage for this content extras view, because I'll be hard to track errors related to any of these builds.
    4. Make sure the documentation clearly illustrates the logical paths builds take. There is going to be some strange sharing of information happening here. Please make sure to document it.
  • Produce internal EPUB files for export. (This replaces the legacy completezip and offlinezip formats.) (Please note, that this should not require API or html file format changes to the cnx-epub package.)

    1. create a new cnx-transforms task to make an internal EPUB (i.e. using cnxepub.make_epub).
    2. Check with Ross and Ed about the filename scheme for the artifact (aka file).
    3. Add format versioning to cnx-epub format, by adding a metadata file to the epub iteself. (This will be used to update any exported files when changes to the format are made.) (Please speak with me (pumazi) about this while you are implementing it.)
    4. Document the current structure of cnx-epub's EPUB file hierarchy as well as each individual file format (excluding those that use the EPUB specification).

Notes to the Dev(s)

Concentrate your focus on making the logical paths that initialize builds and the representation of the build state as either it's status or artifact. To that end, you shouldn't need to make any changes to cnx-epub, which could cascade bugs across the entire cnx application suite. Try your best to document the existing logic.

When in doubt, ask the cnx team. If something needs changed within cnx-epub we'll plan for it and place the task in the backlog. This will prevent bugs from cascading across the entire cnx application suite.

Keep in mind that the cnx-transforms tasks only need access to the Binder tree (aka table-of-contents), because the legacy transforms will be used for now. We are only using the EPUB object for information at this point. In fact, building EPUB's with DocumentPointer objects might be sufficient for a first draft. This will give Derek Ford and company time to inspect and use the binder structures while giving us more time to ensure the Document's html format is correct/valid.

Benefits

  • Reuse existing code. More than two-thirds of existing but unused code can be reused.
  • Utilizes existing (and working) logic within legacy to create transforms that match the existing transforms. This allows us to progressively swap out the legacy implementation with rewritten logic.
  • Makes a minimal working implementation that will encompass the high-level design. This means that the design at the web api won't change. Additionally, the underlying task interface will also remain the same. The only thing that will change are the guts of the tasks. Thus the operation for upgrading the task should be a straight forward code update.

Addressables

  • "What happens when a build fails?" Celery will catch the exception. We can do whatever we want with it at that point (e.g. log, email, try again, etc.).
  • "Why are we still using legacy transforms?" Using legacy transforms gives us the minimally marketable feature (MMF), while still giving us an easy and direct path forward.
  • "How will we monitor transforms (jobs) as they are queued?" Celery will monitor the task status for us. Further monitoring information (utilization, time to execution, etc.) will be shown in flower, a web app used to monitor the rabbitmq queues.
  • "How to I rebuild a transform?" cnx-transforms is designed to run as both a task system and a command-line utility. We can either resubmit the task using flower or run it on the command-line.
  • "Can the tasks be run on a separate system/box?" This isn't obvious at first glance, since the typical design of celery tasks is to include the entire codebase on every machine. Our tasks are designed to require minimal dependencies.

Phase Two (codename: Double-decker Coffin)

This phase can be done in parallel with Phase One, since it does not directly involve the CNX team. This work will be a collaborative coding effort between the OSC textbooks team (Alana's team) and the CNX team. The CNX team will support OSC in implementing a replacement for DocBook.

Tasks

  • Highlight features of DocBook that we wish to keep and apply value to them.
  • Slate a minimally marketable feature (MMF) set for prototyping a solution. For example, produce a PDF that contains all the content including a table of contents (TOC) and glossary of terms. This example would illustrate features for TOC, paging and glossary.
  • Prototype the DocBook replacement. Preferably this will be done in Python. The solution can have a web front-end for ease of use. It must be designed to be importable as a library within cnx-transforms, which will also give it a command-line interface (by virtue of the cnx-transforms design).
  • Document the process and design taken so far.

Benefits

  • The transformation of content is a separate area of concern from the logic that is used to trigger the transformations itself. Doing this work outside of the transforms allows for a concentration on specific tasks rather than the long chain of publishing events that could potentially blend (or spaghetti) into unscoped logic.
  • This approach allows the OSC team to scatch their own itch.
  • The CNX team will not be required to understand the current system. Hence, no lead time will be needed for the CNX team. Later the CNX team can build upon the shoulders of the OSC team's prototype (see addressables).

Addressables

  • "Why is Alana's team doing most of the development?" I (Michael) assume that those responsible, capable and most knowledgable on the subject matter should wrangle the requirments into a prototype. The CNX team can then choose to work from the prototype or rewrite it as they see fit. The important part of this is to give the OSC team the ability to scratch their own itch first.
  • "What's this minimally marketable feature stuff?" It's a project management term that defines a set of features that can feasibly be accomplished within a sprint period.

Implementation Notes

The transformation tasks use the cnx-epub EPUB format, which is a valid EPUB that contains all the content and metadata in HTML without styling. Each task is setup with two parameters, an epub and an optional callback. (See cnx-transforms for examples.) The epub should be treated as the canonical format. Blocking operations should be limited, but will not be seen as issues (see also the asynchronous task in Phase Three).

Phase Three (codename: Sit-down)

This is a rolling phase where each task could be considered as a phase unto itself (high level tasks that may need further breakdown) or tasks that can be included in future sprints verbatim.

Tasks

  • Replace legacy EPUB (the human readable one) generation.
  • Implement backlog'ed features for the PDF generation utility. These are tasks that didn't make it into the the Phase Two MMF.
  • Replace (if necessary) the generation of completezip files. I (Michael) theorize this will not be necessary in the future, since the internal cnx-epub file format should replace it.
  • Break down the transforms into asynchronous (non-blocking) tasks to utilize as much of the machine/box as possible.

Benefits

  • Iterative approach * period *
  • Addressed value over time by priority (synonymous with MMF benefits).

Addressables

  • "Why does this phase look so incomplete?" I (Michael) am not too concerned about these tasks at this time. Also, after we replace DocBook, I'm not sure what other goals the OSC team has.

Codenames

  • cement shoes - In this instance, only the feet are encased in cement until it hardens, then the victim is buried at sea, sometimes while still alive.
  • double-decker coffin - A coffin with a false bottom that accommodates two bodies. The paying customer is in the top tier, and the victim of a mob hit whom the Mafia would like to secretly bury is hidden below. Joe Bonanno was one of the first mobsters to use this method of disposal.
  • sit-down - A meeting among high-level Mafiosi to settle disputes and grievances before violence ensues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment