2025 AIChE Annual Meeting

(448g) Breaking the Boundaries of Bayesian Optimization Utilizing Continuous Chemistry Digital Twins

Authors

Bao Chau - Presenter, Virginia Commonwealth University
Yuma Miyai, Virginia Commonwealth University

Introduction

Chemical reaction optimization is a rapidly evolving field of research involving statistical modeling to understand the underlying relationship between reaction parameters and desired outputs. Chemical reaction design spaces tend to be multidimensional with multiple outputs creating a complex optimization problem that requires experienced chemists' intuition and multiple iterative procedures. The optimization process becomes a larger issue as laboratories evolve to handle more complicated chemical reactions. To counteract the increasing complexity, optimization methods have been developed to leverage statistical modeling and algorithmic experiment optimization. Recent work has shown that Bayesian optimization (BO) is a powerful upcoming tool for optimization, particularly for automated optimization platforms. BO is an iterative algorithm based on response surface modeling typically utilized in hyperparameter tuning of machine learning models. BO has been the default method for chemistry self-optimization platforms which allows for fully autonomous optimization of a designated chemical reaction without human interference. The combination of the two techniques allows for increased throughput for chemistry whilst allowing existing chemists to designate their time elsewhere. Although the optimization campaign occurs autonomously, chemist intuition is required to designate the design space for the BO algorithms to work within. However, weak design space selection becomes detrimental to the process. A large design space is more expensive to explore and optimize reducing the benefits of BO, but a small design space may not contain the global/desired optimum. Our hypothesis is that Bayesian Optimization algorithms can be supplemented with autonomous data analysis to lower the barrier to entry for optimization whilst improving overall efficiency.

Methodology

All computational work was conducted on an HP Omen Laptop equipped with an Intel i7-8750H (2.20 GHz), NVIDIA GeForce GTX 1060. The first digital twin we utilized is benchmark implementation in Summit for a nucleophilic aromatic substitution (SNAr) reaction of difluoronitrobenzene with pyrrolidine. The second is an in-house kinetic model of a multistep continuous manufacturing synthesis platform of ciprofloxacin intermediate. Both kinetic models display minimal errors with less than 5% error reported within given operating conditions. All BO processes, baseline and BtB, were performed utilizing the Summit library (0.8.9) in Python (3.9). The Single-objective Bayesian Optimization (SOBO) was the designated baseline function due to it being a wrapper for GPyOpt. The algorithm was run utilizing the default Matern 5.2 kernel, acquisition function of Expected Improvement and model type of gaussian processes.

Results

BtB-BO displayed 100% success rate with both digital twins and all stress tests and performs best when given free rein over the entire design space. BtB-BO was able to exceed the 95 mg/mL concentration and 0.8 objective function goal for both in-house and SNAr case studies. BtB-BO expands boundaries successfully identifying high performance regions in under 40 total experiments and minimal prior knowledge.

Conclusion

We have proposed a new method, BtB-BO, for autonomous optimization involving boundary expansions of Bayesian optimization design spaces. This method allows for poorly defined design spaces to have a minimal influence on optimization campaigns and obtain optimal values that traditional optimization methods would be impossible to achieve. We describe the workflow and how the data analysis step between each BO iteration function to determine when boundaries should expand and their direction and magnitude. The most impactful improvement from current BO approaches is the reduction of the reliance on expert knowledge for a carefully constructed design space. This improvement allows the acquisition function (Expected Improvement) to find the true global optimum instead of remaining confined to the initially established design space.