We introduce a new neural network model for predicting activity coefficients in liquid mixtures. This model, named PINN-SAC, rigorously maintains thermodynamic consistency while modeling the activity coefficients for multicomponent systems, using only the SMILES representation of the constituent chemicals as input. Inspired by COSMO-SAC, PINN-SAC employs segment activity coefficients to ensure thermodynamically consistent predictions beyond binary mixtures, enabling its application to complex chemical systems without the need for computationally expensive calculations for the molecular screening charges.
PINN-SAC consists of two core components: a sigma profile predictor and a segment activity coefficient predictor. The first stage utilizes a pre-trained SMILES encoder to predict sigma profiles with high accuracy (R² ≈ 0.97) based on a dataset of approximately 25,000 molecular profiles. In the second stage, over a million theoretically computed segment activity coefficients are used to train physics-informed neural networks (PINNs), which enforce thermodynamic consistency by satisfying the Gibbs-Duhem equation and symmetry constraints. Finally, the model undergoes fine-tuning with targeted activity coefficient data, enhancing its predictive accuracy for specific systems.
The base PINN-SAC model achieved an R² of 0.92 on binary activity coefficients on the VT-2005 dataset, comparable to COSMO-SAC results. PINN-SAC offers substantial advantages in efficiency and usability, eliminating the need for quantum mechanical and iterative statistical calculations. Furthermore, with sufficient experimental data, PINN-SAC can be fine-tuned to surpass COSMO-SAC in specific datasets. Its fast inference, thermodynamic rigor, and flexibility make it well-suited for process simulations, industrial design, and high-throughput screening applications, allowing adaptation to specific datasets and outperforming conventional thermodynamic models.