# A [Cloud Service][sumatra-cloud] to Record Simulation Metadata

### [Daniel Wheeler][daniel-wheeler] and Yannick Congo

Thermodynamics and Kinetics Group <br>
Materials Science and Engineering Division <br>
National Institute of Standards and Technology <br>
Gaithersburg, MD

### Brief Description

The notion of capturing each execution of a script or workflow and its
associated metadata is enormously appealing and should be at the heart
of any attempt to make scientific simulations reproducible. In view of
this, we are developing a [backend and frontend
service][sumatra-cloud] to both store and view metadata simulation
records using a robust data scheme agnostic approach.

### Extended Description

The concept of event control is orthogonal to the concept of version
control, it is the concept of capturing every execution of a workflow
or script rather than the changes in that script. Research projects
often use version control as a poor man's event control, a symptom of
the lack of a good event control tool. Given the general focus on
reproducing computational experiments, it is surprising that the
scientific computing community has not been more active in developing
and promoting a good tool for event control.

[Sumatra][sumatra] is a lightweight Python-based event control tool
that is suited to scientists that engage in both research and
development. Since its inception, [Sumatra][sumatra] has not seen wide
use in the scientific computing community evidenced by the lack of
activity on its mailing list. From the authors' experience, one of the
main difficulties with using [Sumatra][sumatra] is the requirement to
maintain and communicate with the backend databases used to store the
generated metadata. Furthermore, the tool offers no effective public
view and no automated way to share the data store. In light of these
drawbacks, we are developing a [backend and frontend
service][sumatra-cloud] to both store and view Sumatra records. Our
intent is for the web service to be as robust and data scheme agnostic
as possible so that changes to the Sumatra client, backend model, API
and frontend do not break the interactions between each element or
backwards compatibility with existing record sets. The Python web
framework, Flask is used to generate the backend endpoints. The
frontend is entirely JavaScript based and generates the data object
model based solely on the data available. The MVC paradigm and data
scheme agnostic approach are vital for Sumatra to effectively leverage
the web and become established across the wider scientific computing
community.

## [Daniel Wheeler][daniel-wheeler] Profile

I am interested in the development and deployment of software for
applied scientific applications. I have a comprehensive knowledge of
numerical algorithms for solving partial differential equations as
well as extensive experience in using and developing more general
scientific computing tools. I am currently working on developing web
services for scientific data discovery. I am one of the lead
developers of the [FiPy][fipy] open-source PDE solver and the
[PyMKS][pymks] materials informatics toolkit.

## [Yannick Congo][yannick-congo] Profile

I am currently a PhD student working on the development of both
standards and tools for data reproducibility. I have a background in
software engineering and distributed systems. My interests involve
data manipulation in distributed architectures across a variety of
platforms. I am experienced in software design and implementation with
a variety of modern scripting languages. I have recently contributed
to [OOF3D][oof], an object oriented finite element tool for materials
science.

[sumatra]: http://neuralensemble.org/sumatra/
[sumatra-cloud]: https://github.com/materialsinnovation/sumatra-cloud
[daniel-wheeler]: http://wd15.github.io/about.html
[yannick-congo]: https://www.linkedin.com/pub/yannick-congo/60/16/411
[fipy]: http://www.ctcms.nist.gov/fipy
[pymks]: http://pymks.org
[oof]: http://www.ctcms.nist.gov/oof/