Daniel Wheeler and Yannick Congo
Thermodynamics and Kinetics Group
Materials Science and Engineering Division
National Institute of Standards and Technology
Gaithersburg, MD
The notion of capturing each execution of a script or workflow and its associated metadata is enormously appealing and should be at the heart of any attempt to make scientific simulations reproducible. In view of this, we are interested in developing a service to both store and view metadata simulation records using a robust data scheme agnostic approach.
The concept of event control is orthogonal to the concept of version control, it is the concept of capturing every execution of a workflow or script rather than the changes in that script. Research projects often use version control as a poor man's event control, a symptom of the lack of a good event control tool. Given the general focus on reproducing computational experiments, it is surprising that the scientific computing community has not been more active in developing and promoting a good tool for event control.
Sumatra is a lightweight Python-based event control tool that is suited to scientists that engage in both research and development. Since its inception, Sumatra has not seen wide use in the scientific computing community evidenced by the lack of activity on its mailing list. From the authors' experience, one of the main difficulties with using Sumatra is the requirement to maintain and communicate with the backend databases used to store the generated metadata. Furthermore, the tool offers no effective public view and no automated way to share the data store. In light of these drawbacks, we are interested in developing a web service to both store and view simulation records using any client. Our intent is for the web service to be as robust and data scheme agnostic as possible so that changes to the client, backend model, API and frontend do not break the interactions between each element or backwards compatibility with existing record sets. The Python web framework, Flask is used to generate the backend endpoints. The frontend is entirely JavaScript based and generates the data object model based solely on the data available. The model-view-controller paradigm and data scheme agnostic approach are vital for data driven simulation management to effectively leverage the web and become established across the wider scientific computing community.