- What is "Thermodynamic Extrapolation"?
- Examples of Efficient Exploration
- Optimal Combination
- Extrapolating Structural Observables
- Further Extensions
- Conclusions
The concept of extrapolating thermodynamic properties measured in classical systems using statistical mechanical principles has been around for some time. This is linked to the fact that derivative properties of a system's free energy are related to fluctuations of observables (such as the number of particles in a simulation) which makes them easy to measure. Consequently, Taylor series expansions are fairly straightforward to obtain. Of course, the range over which this expansion is reasonably accurate is limited by the order of the expansion and how accurately you can measure these fluctuations (how long you can afford to run a simulation). However, by combining this idea of extrapolation with biased sampling (forcing a simulation to systematically visit states along a predetermined order parameter path) we can build a "network" of Taylor expansions that work together to create a remarkably accurate predictor of properties (thermodynamic and even structural) of fluid systems over a broad range of conditions. These extrapolations significantly amplify the amount of information that can be extracted from simulations enabling a small set of them to:
- feed data-intensive regression algorithms such as neural networks,
- accelerate the search for, e.g., optimal experimental conditions or protocols that use simulations as a guide,
- reduce the cost of force-field development,
- screen candidate materials for thermodynamic properties faster
This post is an abbreviated summary of concepts discussed in the manuscripts below (text available upon request if you cannot access them); other authors have also expanded upon these ideas to explore of other systems (such as water) as well, so this list is not an exhaustive review of all work in this area. Please cite appropriately if you find this information helpful.
- "Predicting low-temperature free energy landscapes with flat-histogram monte carlo methods," N. A. Mahynski, M. A. Blanco, J. R. Errington, V. K. Shen, J. Chem. Phys. 146, 074101 (2017).
- "Temperature extrapolation of multicomponent grand canonical free energy landscapes," N. A. Mahynski, J. R. Errington, V. K. Shen, J. Chem. Phys. 147, 054105 (2017).
- "Predicting virial coefficients and alchemical transformations by extrapolating mayer-sampling monte carlo simulations," H. W. Hatch, S. Jiao, N. A. Mahynski, M. A. Blanco, V. K. Shen, J. Chem. Phys. 147, 231102 (2017).
- "Multivariable extrapolation of grand canonical free energy landscapes," N. A. Mahynski, J. R. Errington, V. K. Shen, J. Chem. Phys., 147, 234111 (2017).
- "Predicting structural properties of fluids by thermodynamic extrapolation," N. A. Mahynski, S. Jiao, H. W. Hatch, M. A. Blanco, V. K. Shen, J. Chem. Phys. 148, 194105 (2018).
- "Flat-histogram monte carlo as an efficient tool to evaluate adsorption processes involving rigid and deformable molecules," M. Witman, N. A. Mahynski, B. Smit, J. Chem. Theory Comput. 14, 6149–6158 (2018).
- "Flat-histogram extrapolation as a useful tool in the age of big data," N. A. Mahynski, H. W. Hatch, M. Witman, D. A. Sheen, J. R. Errington, V. K. Shen, Molecular Simulation 1–13 (2020).
- "Extrapolation and interpolation strategies for efficiently estimating structural observables as a function of temperature and density," J. I. Monroe, H. W. Hatch, N. A. Mahynski, M. S. Shell, V. K. Shen, J. Chem. Phys. 153, 144101 (2020).
Ref. 7 is essentially a review of most of this work, which the interested reader should refer to.
The partition function describes the thermodynamic properties of a system as a function of state variables (temperature,
where,
The probability of observing a given state, defined by some order parameter,
This probability distribution is generally referred to as the "macrostate" distribution, where the order parameter defines the "macrostate"; this is because each order parameter value describes a set of microstates. For example, each unique arrangement of particles in a simulation may be considered a microstate, but all microstates with the same total number of particles might be considered a macrostate. Below, on the left, is an example distribution for a single component system at different temperatures (colors); the dashed lines will be explained later. At high
Once the macrostate distribution is known you can compute (equilibrium) thermodynamic (average) properties from it. These are just weighted sums:
In the case of phase separation, you simply break the above sum into regions defining each phase. Thus, you can compute properties like the average number of particles (composition) or density of each phase. In the figure above, on the right, is a grand canonical example showing isotherms (density as a function of chemical potential) which can be computed from such a distribution. The "gaps" in isotherms at lower
However, each isotherm above requires the distribution to be known at that temperature; i.e., the 6 curves above require the knowledge of 6 different distribution functions. In principle, you can perform 6 different simulations to measure these (described below), but this quickly becomes expensive to do many isotherms. Thermodynamic extrapolation allows you to estimate one distribution from another. The basic idea is to expand each point in the distribution as a Taylor series in the variable you wish to extrapolate; for many order parameters (extensive properties, such as
where, e.g.,
The references above contain full derivations and expressions for derivatives in a number of ensembles. However, it should be clear that derivatives can be expressed in terms of averages of (powers and products of) extensive properties. These averages can be obtained using the macrostate distribution as described above. Therefore, the knowledge of
Again, the ensemble, order parameter, and other factors influence the final expression, but this is representative of the procedure. It turns out for many systems the expansions are remarkably accurate even at first or second order! This is essentially because the macrostate distribution is really made up of many Taylor series working together; in the above example,
So how do we obtain the macrostate distribution in the first place? There are a number of different approaches, and the extrapolation methodology described here is agnostic; it does not matter how you obtain it. The method we have used in the past is based on Wang-Landau Monte Carlo simulations. In this method, you store a histogram of the number of times a system visits a given state defined by the order parameter chosen (e.g.,
This is know as a "flat histogram" technique because the combination of the true distribution and the bias become "flat" when it has converged. During the "production" phase of a simulation, properties are measured as a function of the order parameter and used to construct the averages and fluctuations necessary to compute the derivatives.
We do not always have to extrapolate. We can also obtain exact expressions that allow us to "reweight" the distribution. For example, if we know a grand canonical macrostate distribution at one set of chemical potentials, rearranging the initial expressions given for the macrostate distribution we obtain:
where
If we measured the macrostate distribution as a function of all fluctuating extensive properties, we could obtain reweighting expressions that allow us to compute how the distribution changes as a function of the conjugate intensive variables. For example, in the grand canonical case particle number and energy fluctuate while chemical potential and temperature are fixed, so we would have to measure
Typically, one can employ both reweighting and extrapolation. For the systems discussed here, and in the references above, usually 1 rewighting expression is known (usually in chemical potential) while extrapolation expressions can be obtained that allow exploration of other intensive variables; for example, the chemical potentials of a number of other species, or temperature.
Systems with attractive interactions tend to exhibit (ir)reversible aggregration at sufficiently low temperatures. They simply become "stuck" together for an increasing amount of time as temperature decreases. However, "time" is relative. Many systems exhibit rearragements and are ergodic on the order of seconds or less, but typical molecular simulations can only reach (in a reasonable amount of wallclock time) up the order of microseconds or less. Thus, being able to predict the thermodynamic behavior of systems at "low temperature" instead of directly simulating the system is important to understanding their equilibrium thermodynamic properties. There are simulation tools, including flat-histogram techniques, that naturally allow you to overcome barriers and perform advanced sampling under these "hard" conditions. However, it is not always easy to know in advance what the order parameters are that you need to bias, i.e., what the "slow" modes are.
In this example from Ref. 1, a simulation was performed on a simple, single component fluid (a square well) at a supercritical temperature (black curve). We can compare simulations explicitly performed at lower temperatures (solid lines) to those extrapolated from this single supercritical one (dashed lines). Clearly, there is excellent agreement! As temperature decreases, phase separation occurs creating 2 peaks separated by a local minima in
In fact, properties that depend on the shape of
As discussed in Ref. 2, the errors that come from approximate extrapolation tend to get "hidden" in the chemical potential for these grand canonical systems because they control the shape of the curve (recall the "line" being added!). Simply stated, the observable properties computed from the curve tend to be accurate, but the conditions that generate that curve may not be. So the pressure-density curve,
These extrapolations can be similarly applied to anisotropic sytems (see Ref. 1) and systems with internal degrees of freedom (see Ref. 6).
So far we have only discussed bulk systems. However, it is also possible to perform such expansions for confined ones as well. This usually involves assuming the confining material is an additional component which is fixed, into which the fluid adsorbs. Internal degrees of freedom, adsorbent flexibility, and special sampling moves can influence the methodology, and Ref. 6 contains more information. In general though, this extrapolation approach can greatly reduce the cost of screening materials for advantageous properties; for example, selective adsorption of one species from a multicomponent mixture. Below are isotherms of methane asorbing in MOF-950 taken from Ref. 6; open circles are from individual simulations done verify these predictions. The solid lines are temperature extrapolations of the macrostate distribution (obtained using a slightly different approach than what has been discussed so far) performed at
For multicomponent mixtures it is often easier to work with
The probability of a macrostate can be expressed as:
where
The expression means that at fixed
More details and higher order terms can be found in Refs. 4 and 7. In practice, at a given
A simple weighting function can be defined to optimize thermodynamic consistency, that is, to numerically agree with the Gibbs-Duhem equation. Below is an example for an azeotropic system where 5 simulations were performed. On the left, only the "middle" one (chosen to fall exactly on the azeotrope) was extrapolated; points correspond to simulations performed at the given conditions. It clearly is reasonable across a range of temperatures, but only fairly close to the azeotrope in terms of mole fraction,
An innovation due to Monroe (see Ref. 8) is to combine simulations using an interpolating polynomial instead, which is related to the optimal combination of cumulant expansions. In the context of estimating free energy differences, this is actually identical to thermodynamic integration under certain assumptions. Given two macrostate distributions, one can exactly fit a polynomial up to order up to order 2$k$+1 if we expand each observable up to order
So far we have described how the macrostate distribution can be extrapolated, enabling the computation of thermodynamic observables like pressure, density and mole faction. However, the technique is really more general than that. At its core, like histogram reweighting, thermodynamic extrapolation is just a way to predict the probability that a macrostate of the system occurs at equilibrium. If we know some property of that macrostate, and each is well sampled, then the system's average property is just the weighted sum of those macrostates where the weight is the (normalized) macrostate probability.
In fact, structural properties of these systems are also average properties. Consider that the radial distribution function,
where
Similarly, this can expanded up to some order,
More rigorous details are discussed in Ref. 5. It is also possible to extrapolate observables like a polymer's radius of gyration or the cluster size distribution of a self-assembling system; this allows you to estimate features like the critical micelle concentration (CMC) of amphiphilic systems at low temperatures quite accurately.
Even further innovation is due to Hatch et al. (Ref. 3) who extended the concept of thermodynamic extrapolation to importance-sampling methods such as Mayer-sampling Monte Carlo simulations. These simulations are often used to compute virial coefficients. This also enables one to perform alchemical transformations by extrapolating model parameter values, such as the point charge in SPC/E water. Thus, extrapolation can help in optimizing model parameters to create force fields by fitting to some experimentally oberved properties. Furthermore, the plethora of data that results from these extrapolations enables the use of data-intensive regressors, like neural networks, which would otherwise require more data to fit than can be generated with a reasonable amount of computational expense and time.
Thermodynamic extrapolation is an approximate technique and has very old roots. However, its simplicity belies its power. Here, a "network" of Taylor series work in concert to provide a remarkably accurate property predictor. Once a macrostate distribution is known at one condition it is possible to estimate it over a broad range of conditions. The range over which this extrapolation is accurate has proven to be surprisingly large for many systems. This enables the calculation of thermodynamic and structural properties with reduced computational effort. The necessary average quantities, like particle number, can even be reconstructed using log files from older simulations, giving them new life and enabling consistency checks to be performed. Thermodynamic extrapolation has also proven useful in facilitating other types of advanced sampling simulations, in providing sufficient information to enable data-intensive analysis and regression, and can be used to perform high-throughput screening.