2025 AIChE Annual Meeting
(559a) Machine Learning for Organic Solubility Prediction in Mixtures
Authors
Melodie Christensen, Merck & Co, Inc.
Joseph Smith, Merck & Co, Inc.
Joseph Smith, Merck & Co, Inc.
Determination of solubility in binary mixtures is a critical task across a variety pharmaceutical process development efforts, including antisolvent crystallizations, liquid-liquid extractions, azeotropic distillations, and separation techniques. Identifying the optimal solvent mixture, however, is frequently a time-consuming and laborious process that requires extensive empirical screening of a large chemical space. A possible solution to address this challenge centers around leveraging machine learning to predict the solubility of a solute across diverse solvent mixtures, thereby enabling prioritization of targeted solvent screening experimental efforts or potentially reducing experimental needs entirely. To date, a variety of recent approaches have been explored for solubility prediction, including semi-empirical methodologies, density functional theory-based tools, and machine learning approaches. These existing approaches, however, frequently fail to extrapolate with high accuracy when prompted with data from a small number of solvents. In this work, we herein propose a new methodology that combines machine learning with domain-specific constraints for predicting binary mixture solubilities with limited experimental data for new solutes. We leverage historical solubility data from small molecule-based pharmaceutical process development efforts spanning over 40 solutes and over 3000 individual solubility measurements. We illustrate that training our machine learning model on historical data improves predictions even when limited data is available for a new solute. Furthermore, we demonstrate that our proposed technology can accurately predict trends for binary solvents where existing domain-specific models fail. Our approach ultimately allows for extrapolation of early-stage solubility screening data across vast solubility landscapes, thereby greatly reducing experimental resources and time.