- Loss of opportunity for interoperability with other software that is easier with a standard format
- Run-time Python dependency is a source of never-ending difficulty for users
- Extra work required to implement new input file features, since they need to be implemented in both CTI and XML, which leads to incompleteness in the CTI interface and requries users to use the XML interface in certain cases.
-
The format itself is needlessly verbose. Consider the definition of a single 'falloff' reaction in XML:
<reaction reversible="yes" type="falloff" id="0095"> <equation>OH + CH3 (+ M) [=] CH3OH (+ M)</equation> <rateCoeff> <Arrhenius> <A>2.790000E+15</A> <b>-1.4299999999999999</b> <E units="cal/mol">1330.000000</E> </Arrhenius> <Arrhenius name="k0"> <A>4.000000E+30</A> <b>-5.9199999999999999</b> <E units="cal/mol">3140.000000</E> </Arrhenius> <efficiencies default="1.0">C2H6:3 CH4:2 CO:1.5 CO2:2 H2:2 H2O:6 </efficiencies> <falloff type="Troe">0.412 195 5900 6394 </falloff> </rateCoeff> <reactants>CH3:1 OH:1.0</reactants> <products>CH3OH:1.0</products> </reaction>
compared to the equivalent CTI (Python):
falloff_reaction("OH + CH3 (+ M) <=> CH3OH (+ M)", kf=[2.79000E+18, -1.43, 1330], kf0=[4.00000E+36, -5.92, 3140], falloff=Troe(A=0.412, T3=195, T1=5900, T2=6394), efficiencies="C2H6:3 CH4:2 CO:1.5 CO2:2 H2:2 H2O:6")
Even after removing the extra whitespace, it's still twice as long.
-
It requires extra processing to extract useful information for (a) the mappings of species name to quantities contained in the
<efficiencies>
,<reactants>
and<products>
tags (b) array data such as that in the<falloff>
tag. All of the alternatives (Python, JSON, YAML) have intrinsic support for mapping and array data types. -
It contains redundant information, which leads to confusion and errors. The reaction stoichiometry is encoded both in the
<equation>
tag as well as in the<reactants>
and<products>
tags. -
The method for encoding arrays is inconsistent. In some places, we have a space delimited string, e.g. the
<falloff>
tag here. In others (e.g. thefloatArray
associated with species thermo data), we have comma delimited lists. Which of these formats is allowed in any given context is a mystery. -
Cantera misses one of the key benefits of using an standard format such as XML: There are existing XML parsing libraries that work just fine, and there's no reason for Cantera to have it's own XML parser.
-
Extracting data from the XML tree requires writing a lot of code. For example, here's a snippet of XML code from the definition of a HMWSoln object:
<thetaAnion anion1="Cl-" anion2="OH-"> <Theta> -0.05, 0.0, 0.0, 0.0, 0.0 </Theta> </thetaAnion>
The function to read and validate the data from this node is 80 lines long (see https://github.com/Cantera/cantera/blob/master/src/thermo/HMWSoln_input.cpp#L235).
- Need to decide between JSON, YAML, and other alternatives
- Want to separate input file parsing from actual application logic (compare the tight coupling of
ThermoPhase::initThermoXML
to thesetupFooReaction
functions which are called bynewReaction(XML_Node&)
). - Should be able to create objects without any explicit input file
- Already possible for ideal gases through
Reaction
andSpecies
objects
- Already possible for ideal gases through
- Should be able to serialize objects created in this way and generate new input files
- Old input files can be supported by writing translators
- Translator from CTI is just a modified version of
ctml_writer.py
- Translator from CTI is just a modified version of
- Successful implementation is made difficult by large number of classes missing test coverage (Cantera/cantera#267)
- Also need to replace XML as the input/output file format for the 1D solver
- With YAML, significance of whitespace may confuse some users
- With both YAML and JSON, order of keys in mappings is not specified, so serialization can result in keys ending up in any order
YAML certainly is the most visually appealing format. The whitespace constraint would serve to explicitly reinforce readable files.