2025 AIChE Annual Meeting

(678c) The Group Contribution Gaussian Process Regression Method for Property Prediction of Materials

Authors

Edward Maginn, University of Notre Dame
Reliable property prediction is germane to modern materials discovery endeavors. Amongst the existing property prediction methods today, there are often tradeoffs between prediction accuracy, data efficiency, model complexity, parsimoniousness, and speed of predictions, with little to no provision for reliable estimation of uncertainty of predictions. This work presents the group contribution Gaussian process (GCGP) hybrid modeling approach for fast, accurate, physically interpretable property prediction with inherent uncertainty quantification for more reliable decision-making in computational molecular discovery workflows.

The method involves using a simple first-order group contribution (GC) method, such as the Joback and Reid (JR) GC method, to make a first guess of the property of interest for any molecule that can be treated using the chosen GC method. The GC predicted property is used as the mean function in a Gaussian process (GP) regression model that has been trained to learn and correct any systematic bias in the GC predictions. The GP thus outputs a significantly more accurate prediction of the property of interest than the original GC predictions while providing reliable and readily accessible prediction uncertainty estimates. We applied this method to several substance-specific constant properties for which there were equations and parameters in the JR GC model, including normal boiling temperature, critical temperature, critical pressure, and others. We then investigated the method's applicability for viscosity, a temperature-dependent property for which there is a JR GC equation, and then for another temperature-dependent property, vapor pressure, for which there is no JR GC equation.

Furthermore, we tested the applicability of the method to an environmental property, the global warming potential, that cannot usually be predicted using thermodynamic or correlation models. We found that the GCGP method in the cases investigated generally provides more accurate predictions than the GC predictions alone. The uncertainty estimates are reliable even when applied to 'new' molecules that are not used in training or validating the models. Finally, we demonstrate the usefulness of the GCGP method in a molecular discovery workflow for finding alternative and environmentally friendly refrigerants.