A 5 reel 4K feature will require about 7TB for the DPX alone. The CRI should be a lot smaller, but could require 3TB. So that's pretty much the pegasus as is. This could then be copyit'd to the ISIS, though in the grand scheme of things, it makes most sense to export directly from CRI to the ISIS. So this means that you can have multiple CRI scans on the go on the pegasus, and then in downtime, DPX exports can be created. Or Da Vinci can be run on another computer to perform the transform. So this leaves the Win7 HP machine to read from the ISIS and rawcook to either the ISIS or an internal 10TB drive. The reversibility testing could happen on the internal 10tb drive. Otherwise, should we look into using 2k for 35mm prints, 4k for neg?
DCDM contains: parent folder image sequence folder per reel audio per reel which can be object per channel or combined.
Retaining original order might be preferable here. Zipping is the easiest way to do this. May need to restore what was given to us. Further complication subs!
options are:
zip everything
make tech record before or after in a manual fashion. possibly add -tech to accession.py to insert tech record like with -filmo
possibly add a new field to suggest that this is a zip.
This workflow is ready to go.
rawcooked:
trickier as original order might be messed up.
Could provide image folder
parent older names might contain essential metadata like frame rate.
Perhaps seq2ffv1 replicates the source folder structure, it just replaces image sequence folders with mkvs.
Adding to raw cooked:
still difficult to anticipate original order.
Other option:
zip now, raw cook later.
sipcreator should get sha512 from the get-go. Editing checksum manifests is simple at the moment. The question is - how to handle the insertion of the filmographic and the technical record? If we force the assumption that a filmographic exists prior to sipcreation, then we introduce lots of bottlenecks. Currently, accession.py will:
- make dfxml - this can be done by sipcreator. updating this is hard right now, but we could leave it as is as the objects will have the correct info, and that's the focus of our preservation activities.
- renames with aaa number (can be done by human) and adds accession event to log (more difficult, but does it have to be in the log?)
- adds filmographic - this can totally be done at the sipcreator stage for reproductions or anything already with a filmographic. So perhaps leave the option open? If a filmo exists, add it at the sipcreator phase, otherwise add it at accesion.
- pbcore. really needs the filmographic number to be functional, but this could be added manually. OR if the filmo alrady exists, just make it at the sipcreator phase.
There will potentially be many instances where the sipcreator package is the final package. THe only thing that will have to be done is rename with accession number, which can be done manually, or perhaps accession.py can run, see what's missing and what exists, and only do what needs to be done - namely add the number.
So the unittesting proved harder than I thought.
I've started to trim back on some of the main functions, and i've added some potential unittest ideas to the docstrings of functions that I am refactoring.
The tutorial below looks like one of the simplest i've seen.
I've also looked into running packagecheck.py a bit more to be able to catch regressions. I think eventually I could refactor this into the unittest
module.
https://www.blog.pythonlibrary.org/2016/07/07/python-3-testing-an-intro-to-unittest/
So much needs to be done. In particular - unittesting, refactoring, python 3. The killer here is figuring out the order in which to do these. Initially I thought that unittesting could come first, it would make python3 and refactoring easier, as it would be easier to notice what I was breaking. According to some stack overflow posts, add tests for legacy code makes less sense than:
- Refactoring leagcy code
- Adding unittests around the refactored code
- Draw a line in the sand and all new code is unittested
So I'm thinking that this is the way to go. And look into test-driven-development. This should simplify the thinking around writing new code. Currently, I pass too many variables around. There's too much technical debt.
In terms of the refactoring - it's important to isolate the key scripts that are in live production use. At a glance, this appears to be: copyit, sipcreator, accession, batchaccession, normalise, makepbcore, the dfxml scripts, manifest, validate, accession_register, strongbox_fixity, concat, package_update, deletefiles,seq2ffv1, multicopy, ififuncs. My god. OK - so find the main thread of these scripts dependency on each other. For example, copyit is at the root of most scripts. As is sipcreator. and ififuncs. Start with those, make functions easy to test, remove technical debt.
Actually - adding better docstrings will help with this as well. Perhaps Leanne or others could help with this?
rm -rf dist/*
update setup.py with the new version number
python setup.py sdist bdist_wheel
twine upload dist/*