Philip Guo summarized the problem quite well in Burrito. Are there any modern solutions to this problem ?
A nice collection of all tools provided by @pditommaso. Some subset worth trying out.
So far Sumatra/noworkflow/recipy/WorldMake appear to care most about provenance tracking; nextflow appears to be a very promising upgrade to gnu make for containerized data science pipelines.
- Nextflow
- Sumatra
- Luigi and SciLuigi
- Doit and an tutorial from sw
- nipype
- joblib Does this effectively give us provenance tracking for free?/use with noworkflow?
- drake "Make for Data"
- noworkflow, Recommended by Philip Stark
- recipy similar to Sumatra but only works within python
- Flex A command-line tool for data science pipelines
- WorldMake
- Reprozip From the noworkflow folks, useful for capturing environment information only.
- [scikit-bio] has a nice workflow module worth looking into. Useful project templates
- Cookiecutter and template
Also, a nice example of a reproducible workflow but any real project is likely to be far more complicated.
Updates: New tools for running computational experiments.
- SciExp https://pypi.python.org/pypi/sciexp2/1.1.9
- Comp-Exp https://pypi.python.org/pypi/comp-exp/2.3.1
- Lazyrunner (old) http://www.stat.washington.edu/~hoytak/code/lazyrunner/
- Expyriment http://www.expyriment.org/
- Reprozip and Reprounzip https://pypi.python.org/pypi/reprounzip/
- pypet is a wrapper around sumatra for dealing with multiple experiments https://pypi.python.org/pypi/pypet/0.4.0