Implement an algorithm to compare the provenance from two (or more) trials (i.e., executions of an experiment) to check their reproducibility. The provenance stored in the relational (sqlite) database by noWorkflow 2 contains intermediate variable values from a trial. These values could be compared to check how much or where executions deviate from each other.
MIT: https://github.com/gems-uff/noworkflow/blob/master/LICENSE
Contributor Covenant: https://github.com/gems-uff/noworkflow/blob/master/CODE_OF_CONDUCT.md
It currently has some methods to explicitly tag variables of different trials and methods to compare them. It would be nice to have a way to compare the whole trial and estimate how much a trial deviate from another.
- Compare trials of the same script
- Estimate how much on trial deviate from another
- Consider different scripts and execution flows
- Indicate which parts of the scripts are not reproducible
Each task has a different outcome
- Prerequisites:
- Python
- SQL or SQLAlchemy ORM
- Expected Time: 350h
- Potential Mentor(s): João Felipe Pimentel
Add support for different levels of provenance collection in noWorkflow 2.
MIT: https://github.com/gems-uff/noworkflow/blob/master/LICENSE
Contributor Covenant: https://github.com/gems-uff/noworkflow/blob/master/CODE_OF_CONDUCT.md
Currently, noWorkflow 2 collects Python construct evaluations and all the dependencies among the evaluations. However, this collection is inefficient, since some of the collected provenance may not be necessary for end-users.
- Disable the collection inside specific functions (through decorators?)
- Disable the collection inside specific regions of the code (through with statements?)
- Collect only function activations in a region, instead of all variable dependencies
- Disable the collection of specific modules
- Design a DSL to express general dependencies for parts of the code where the collection is disabled
In this project, it is desirable to provide ways to temporarily disable the provenance collection and to manually indicate the provenance in this situation.
- Prerequisites:
- Python
- Expected Time: 350h
- Potential Mentor(s): João Felipe Pimentel
Implement new AST transformations for provenance collection.
MIT: https://github.com/gems-uff/noworkflow/blob/master/LICENSE
Contributor Covenant: https://github.com/gems-uff/noworkflow/blob/master/CODE_OF_CONDUCT.md
While noWorkflow 2 works for newer Python versions, most of its implementation was targeted at Python 3.7. Newer Python versions have new constructs in which the provenance is ignored.
- Identify which AST constructs implementations are missing
- Design AST transformations to execute functions before and after the evaluation of the constructs
- Create the dependencies for the new constructs
A new version of noWorkflow that supports Python constructs from newer versions.
- Prerequisites:
- Python
- Expected Time: 350h
- Potential Mentor(s): João Felipe Pimentel