2023 AIChE Annual Meeting
(197av) Predicting Reaction Performance in Amide Bond Formation Using Machine Learning: The Role of High-Quality Data
Authors
In this work, we systematically investigate the impact of including unsuccessful reaction outcomes and reaction conditions in the development of ML-based reaction prediction models for amide bond formation. Generic data are extracted from the Reaxys database [6] to test the benchmark performance of the predictive models. In order to achieve accurate and unbiased results, a custom-made (CM) dataset for the amide bond formation is generated by sampling the generic reaction data such that the distribution of yield mimic those of the published high-throughput experimentation (HTE) data set [7]. Having curated the data, the ML-based models for predicting reaction yield [1] and optimizing reaction conditions [3] are trained. Finally, the comparative performance of such models is assessed to evaluate the importance of data quality and to investigate the applicability of CM data for building predictive reaction models.
References
[1] Schwaller, P., Vaucher, A.C., Laino, T. and Reymond, J.L., 2021. Prediction of chemical reaction yields using deep learning. Machine learning: science and technology, 2(1), p.015016.
[2] Ahneman, D.T., Estrada, J.G., Lin, S., Dreher, S.D. and Doyle, A.G., 2018. Predicting reaction performance in CâN cross-coupling using machine learning. Science, 360(6385), pp.186-190.
[3] Shields, B.J., Stevens, J., Li, J., Parasram, M., Damani, F., Alvarado, J.I.M., Janey, J.M., Adams, R.P. and Doyle, A.G., 2021. Bayesian reaction optimization as a tool for chemical synthesis. Nature, 590(7844), pp.89-96.
[4] Hickman, R.J., Aldeghi, M., Häse, F. and Aspuru-Guzik, A., 2022. Bayesian optimization with known experimental and design constraints for chemistry applications. Digital Discovery, 1(5), pp.732-744.
[5] Struble, T.J., Alvarez, J.C., Brown, S.P., Chytil, M., Cisar, J., DesJarlais, R.L., Engkvist, O., Frank, S.A., Greve, D.R., Griffin, D.J. and Hou, X., 2020. Current and future roles of artificial intelligence in medicinal chemistry synthesis. Journal of medicinal chemistry, 63(16), pp.8667-8682.
[6] Reaxys, Elsevier B.V.
[7] Avila, C., Cassani, C., Kogej, T., Mazuela, J., Sarda, S., Clayton, A.D., Kossenjans, M., Green, C.P. and Bourne, R.A., 2022. Automated stopped-flow library synthesis for rapid optimisation and machine learning directed experimentation. Chemical Science, 13(41), pp.12087-12099.