A Cloud Service to Record Simulation Metadata
Daniel Wheeler and Yannick Congo
Thermodynamics and Kinetics Group
Materials Science and Engineering Division
National Institute of Standards and Technology
Gaithersburg, MD
The notion of capturing each execution of a script or workflow and its associated metadata is enormously appealing and should be at the heart of any attempt to make scientific simulations reproducible. In view of this, we are developing a backend and frontend service to both store and view metadata simulation records using a robust data scheme agnostic approach.
The concept of event control is orthogonal to the concept of version control, it is the concept of capturing every execution of a workflow or script rather than the changes in that script. Research projects often use version control as a poor man's event control, a symptom of the lack of a good event control tool. Given the general focus on reproducing computational experiments, it is surprising that the scientific computing community has not been more active in developing and promoting a good tool for event control.
Sumatra is a lightweight Python-based event control tool that is suited to scientists that engage in both research and development. Since its inception, Sumatra has not seen wide use in the scientific computing community evidenced by the lack of activity on its mailing list. From the authors' experience, one of the main difficulties with using Sumatra is the requirement to maintain and communicate with the backend databases used to store the generated metadata. Furthermore, the tool offers no effective public view and no automated way to share the data store. In light of these drawbacks, we are developing a backend and frontend service to both store and view Sumatra records. Our intent is for the web service to be as robust and data scheme agnostic as possible so that changes to the Sumatra client, backend model, API and frontend do not break the interactions between each element or backwards compatibility with existing record sets. The Python web framework, Flask is used to generate the backend endpoints. The frontend is entirely JavaScript based and generates the data object model based solely on the data available. The MVC paradigm and data scheme agnostic approach are vital for Sumatra to effectively leverage the web and become established across the wider scientific computing community.
Daniel Wheeler Profile
I am interested in the development and deployment of software for applied scientific applications. I have a comprehensive knowledge of numerical algorithms for solving partial differential equations as well as extensive experience in using and developing more general scientific computing tools. I am currently working on developing web services for scientific data discovery. I am one of the lead developers of the FiPy open-source PDE solver and the PyMKS materials informatics toolkit.
Yannick Congo Profile
I am currently a PhD student working on the development of both standards and tools for data reproducibility. I have a background in software engineering and distributed systems. My interests involve data manipulation in distributed architectures across a variety of platforms. I am experienced in software design and implementation with a variety of modern scripting languages. I have recently contributed to OOF3D, an object oriented finite element tool for materials science.