2024 AIChE Annual Meeting

(218g) Bayesian Optimization Applied to Chemical Reaction Optimization: Human-in-Loop and Cdmo Perspective

Authors

Rudi Oliveira, Hovione FarmaCiência, SA
Saúl Silva, Hovione FarmaCiencia SA
Pedro Valente, Hovione FarmaCiência SA
Development of a chemical synthesis involves optimization of multiple parameters, such as: substrates; catalysts; reagents; solvents; concentrations; and temperatures, among other variables. Due to timeline and cost restrictions, it is not possible to study the full range of process parameters and only a limited number of experiments are considered to find optimal conditions. Scientists typically draw upon their accumulated knowledge and available literature, to identify key parameters that most significantly influence the success of a reaction at scale. To complement this Human effort, there has been significant advances in automated optimization techniques, also referred to as Algorithm Process Optimization (APO).[1] Among these, Bayesian Optimization (BO) has proven to be particularly effective, often surpassing the acumen of human experts and other advanced optimization algorithms.[2,3] It alternates between exploring uncertain regions of an operational space and exploiting known data. This method has the advantage of being applicable to a wide range of parameter spaces, including those with complex, parameterized domains, and can design multiple parallel experiments.

To illustrate the performance and robustness of different process optimization strategies, we conducted a “virtual lab” case study based on the work developed by Shields et al..[3] This study was designed around a high throughput experimentation (HTE) database, complemented with a calculated dataset for level of impurities and a green score, aiming to add real-world relevance in the complex task of optimizing a process for performance and sustainability. The case-study presented allows to benchmark traditional human SME with BO in the effectiveness and efficiency in optimizing a given chemical reaction, but also to shed light on how experts from different backgrounds and expertise levels interact with a BO tool.

An initial comparative analysis of the optimization data (Figure 1) shows that larger green score values occur for lower reaction yields, even though it linearly increases with yields when all other factors are kept constant. This reflects the inherent trade-offs faced in optimizing for both reaction efficiency and sustainability. All but one scientist reached at least one optimum condition of the Pareto Front, i. e. achieving an average between the Yield and Green Score within the top four possible values conditioned on impurities being within the specified limit.

Clustering the design space and results in a t-distributed Stochastic Neighbor Embedding (t-SNE) visualization,[4] allowed to map the individual optimization trajectories of different scientists towards finding the Pareto optima. Even though the optimization outcome was similar in some cases, the strategies taken were starkly different, even among scientists using Bayesian optimization which can lead the optimization on its own without Human interference. This demonstrates that Human input can either improve or deteriorate performance of the algorithm. As such, there is the need for better human-in-the-loop strategies that can combine the algorithms with human expertise, streamlining the path through the inherent complexities of organic synthesis and unlocking new possibilities in sustainable process development.