2025 AIChE Annual Meeting

(588by) Predicting Solvation Properties in Solvent Mixtures Using Machine Learning

Many molecular syntheses rely on liquid phase reactions and extractions. While using single component solvents is ubiquitous for these solvation processes, multi-component solvent mixtures have properties that are more tunable. With this flexibility also comes a need for predictive modeling tools to aid in optimizing possible solvent combinations. Solvation free energy is a key thermodynamic value that controls solvation processes. Previous studies demonstrated the ability of machine learning (ML) models to predict solvation free energies and enthalpies in single component solvents. Here we present an ML model that predicts infinite-dilution solvation free energy and enthalpy for neutral organic solutes in multicomponent solvent mixtures.

The model uses two directed message passing neural networks (D-MPNN) to separately transform the solute and solvent molecular graphs into vector encodings which are then concatenated and sent through a feed forward network (FFN) to give the final property prediction. Each component in the solvent mixture is encoded using the same D-MPNN and these component encodings are combined in a mole fraction weighted average to give the overall solvent encoding. This allows the model to accept any number of components in the solvent mixture.

To test the model, a set of 34,000 experimental solvation free energies in solvent mixtures were computed from three- and four-component vapor liquid equilibrium (VLE) data.

We first pre-train the model on solvation free energies and enthalpies in mono solvent and binary solvent calculated using COSMOtherm’s implementation of COSMO-RS. The pre-trained model had a mean average error (MAE) that was comparable to the accuracy of COSMOtherm on the same data. When we fine tuned the model using experimental mono solvent data, the MAE slightly decreased. The MAE of the model for ternary solvent data was similar to the binary solvent test, which demonstrates the model architecture can extrapolate to systems with more components in the solvent than it was trained on. We observed the final model had a lower MAE for binary solvent data points which did not include water as one of the co-solvents, which we suggest is primarily due to a lack of experimental data in the training set where water is present as a co-solvent.