Skip to content

Instantly share code, notes, and snippets.

@faical-yannick-congo
Forked from wd15/abstract.md
Last active August 29, 2015 14:17
Show Gist options
  • Save faical-yannick-congo/d437babee0291cc0d0d2 to your computer and use it in GitHub Desktop.
Save faical-yannick-congo/d437babee0291cc0d0d2 to your computer and use it in GitHub Desktop.

A Cloud Service to Record Simulation Metadata

Daniel Wheeler and Yannick Congo

The notion of capturing each execution of a script or workflow and its associated meta-data is enormously appealing and should be at the heart of any attempt to make scientific simulations reproducible. The concept of event control is orthogonal to the concept of version control, it is the concept of capturing every execution of a workflow or script rather than the changes in that script. Research projects often use version control as a poor man's event control, a symptom of the lack of a good event control tool. Given the general focus on reproducing computational experiments, it is surprising that the scientific computing community has not been more active in developing and promoting a good tool for event control.

Sumatra is a lightweight event control tool that is particularly suited to scientists that are in the intermediate stage between developing a code base and using that code base for active research, a mode of research many members of the scientific Python community employ. However, Sumatra has not seen wide use amongst the scientific Python community or the wider scientific computing community. One of the main difficulties when using Sumatra is the requirement to support, maintain and communicate with the backend databases required to store the simulation metadata generated by Sumatra. Furthermore, no effective public view of the data model is available and no automated way to share and communicate the metadata effectively. In light of these drawbacks, we are developing a backend and frontend service to both store and view Sumatra records. Our intent is to be as data agnostic as possible so that changes to the Sumatra client, backend model, API and frontend do not break the interactions between each element or backwards compatibility for existing record sets. Flask is used to generate API endpoints and serve up chunks of data to the frontend app. The frontend is entirely JavaScript based and generates the data object model based solely on the data available using D3. The model, view, controller paradigm and data agnostic approach are vital for Sumatra to effectively leverage the web and become established across the wider scientific computing community.

I am interested in the development and deployment of software for applied scientific applications. I have a comprehensive knowledge of numerical algorithms for solving partial differential equations as well as extensive experience in using and developing more general scientific computing tools. I have worked with Python, C, Fortran and Python/C interfaces and general numerical tool kits for over a decade. I am currently working on web development for scientific applications. I am one of the lead developers of the FiPy open-source PDE solver and the PyMKS materials informatics toolkit.

I am currently a PhD student working on the aspect of reproducibility in the scope of computer sciences. I have a background in software engineering and distributed systems. My interests are spread around distributed architectures that involve data manipulation (generation, collection, processing, distribution, formatting) accross various models (native, hybride, web) in different plateforms (mobile, computer, custom embedded devices). I have some experience in software analyse, conception, design and implementation in (Assembly, C/C++, java, python and ruby). My latest contributions were on OOF3D (Object Oriented for Finite elements analysis) which is a tool that allow material scientists and physicists to perform some computations from image of some real or simulated microstructures. I am currently working on various projects designed to improve the reproducibility of research results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment