2025 AIChE Annual Meeting

(702e) Bayesian Optimization Framework for Coformer Selection and Property-Guided Co-Crystal Discovery

Pharmaceutical cocrystals continue to gain interest due to their potential to improve the physicochemical properties of active pharmaceutical ingredients (APIs). One challenge remains in finding a coformer that will not only form a cocrystal with an API of interest, but also yield a cocrystal with desirable properties such as enhanced bioavailability, stability, and tabletability.

We present a Bayesian optimization framework for the prediction and discovery of cocrystals, combining machine learning with experimental validation. Our approach begins with a Gaussian Process Classifier trained on a feature set comprising molecular fragment fingerprints, MQN (Molecular Quantum Numbers)[1], logP, and min–max partial charges. A key challenge in modeling cocrystallization lies in the imbalance of available data: the number of compound pairs that form cocrystals from the Cambridge Structural Database (CSD) is more than four times greater than the number of negative pairs reported in the literature. To address this, we constructed balanced subsets from a curated dataset of 6,339 cocrystals. Training on just 30% of this dataset, and using Bayesian optimization to sequentially select the most informative points, we achieved over 95% classification accuracy.

We further extended this framework to a Gaussian Process Regression model that enables the simultaneous prediction of cocrystal formation and optimization for specific target properties — in our case, aqueous solubility. This allowed us to cast coformer selection as a dual-objective optimization problem: to identify compounds that form cocrystals and increased solubility. Our model learns iteratively, suggesting the next most promising experiment to evaluate based on prior results. In our tests with nicotinamide, isoniazid, salicylic acid, malonic acid, and caffeine as pivot compounds, alongside up to 100 different potential coformers for each compound, our results show that the optimal coformers were identified in 3 to 7 iterations. In practice, this means only 3 to 7 experiments are required to find the most soluble cocrystal with 95% confidence. This substantially reduced experimental burden for an experimental validation campaign to synthesize and characterize cocrystals using Powder X-Ray Diffraction (PXRD) and High-Performance Liquid Chromatography (HPLC). These results underscore the utility of Bayesian optimization as a powerful tool for guiding cocrystal discovery and property tuning in pharmaceutical and materials design.

[1] Nguyen et al., “Classification of Organic Molecules by Molecular Quantum Numbers.”