2025 AIChE Annual Meeting

(202i) Organic Solubility Prediction at the Aleatoric Limit with Neural Networks

Authors

Jackson Burns - Presenter, University of Delaware
Patrick Doyle, Massachusetts Institute of Technology
William Green, Massachusetts Institute of Technology
The solubility of small molecules in organic (non-aqueous) solvents at arbitrary temperatures impacts every stage of the drug pipeline including screening, benchtop and reactor-scale synthesis, and formulation. Experimental determination of solubility is a time- and resource-intensive process which tends to yield noisy results. a-priori knowledge of a molecule’s solubility would therefore accelerate this entire workflow immensely, though arriving at a practically useful estimate is challenging [1].

Physics-based in-silico estimations of solubility are often inaccurate and limited in their flexibility with respect to both solute and solvent. Artificial intelligence methods trained on datasets of experimentally determined solubilities scraped from scientific literature have made significant progress in providing estimations, but are not without limitations. The literature-best model for temperature-dependent solubility predictions by Vermeire et al. [2] does not address the inherent experimental noise when reporting its performance.

To address these shortcomings, we present the fastsolv solubility predictor (Figure 1a). This machine learning model provides predictions of small molecule solubility in arbitrary solvents and at arbitrary temperature under rigorous extrapolation conditions at twice the accuracy of the current state of the art. The model also provides physically reasonable and numerically accurate gradients of solubility with respect to temperature (Figure 1b), of special interest in process chemistry, due to the use of Sobolev training. Even in cases where the fastsolv model predictions are not accurate, it still often achieves correct rank-ordering of solvents, making it useful for solvent screening.

fastsolv is highly extensible, open source, and freely available under the MIT license at GitHub.com/JacksonBurns/fastsolv and fastsolv.mit.edu hosts a demonstration of the fastsolv model. Python users may install the fastsolv python package for easy integration into their own workflows. The research community has already begun to do so, with fastsolv having been integrated into ASKCOS, Rowan Scientific, and Wolfram. A preprint describing fastsolv in greater detail is published on ChemRxiv.

[1] Murdande, S.B., Pikal, M.J., Shanker, R.M., Bogner, R.H.: Aqueous solubility of crystalline and amorphous drugs: challenges in measurement. Pharmaceutical development and technology 16(3), 187–200 (2011)

[2] Vermeire, F.H., Chung, Y., Green, W.H.: Predicting solubility limits of organic solutes for a wide range of solvents and temperatures. Journal of the American Chemical Society 144(24), 10785–10797 (2022) https://doi.org/10.1021/jacs.2c01768