2025 AIChE Annual Meeting

(584aw) A Scalable Dataset of Organic Reaction Mechanisms Constructed Using Templated-Based Breadth-First Mechanism Intermediate Search

Authors

Yong Liu, The Hong Kong University of Science and Technology
Fei Sun, Hong Kong University of Science and Technology
The ability to predict the organic reaction mechanism is crucial in many aspects, including reaction condition optimization, methodology optimization, and synthetic route planning. Despite current efforts in leveraging machine learning models to predict reaction mechanisms, the dataset for reaction mechanisms remained scarce. Furthermore, the existing reaction mechanism dataset emphasizes the well-documented reactions, leaving the less-documented reactions underrepresented.

In this work, we constructed a new template-based reaction mechanism dataset construction framework. Our approach enumerates all possible mechanisms within a curated set of elementary reaction steps in SMILES Arbitrary Target Specification (SMARTS) to encourage a more explorative reaction mechanism search. We curated 1769 templates from a graduate textbook[1], and we used these templates to reproduce reaction mechanism pathways for reactions in Pistachio[2], a database that recorded patented reactions. In total, we reproduced reaction mechanisms for 21,544 reactions with 45,518 mechanistic steps. We subsequently evaluated this dataset by training three different reaction outcome prediction models by conducting mechanism prediction[3][4][5]. The best-performing model achieved 85.50% top-1 accuracy in single-step prediction and 76.88% top-1 accuracy in multistep reaction mechanism prediction. Our model showed capability in exploring less studied reaction mechanisms, which shows our proposed framework an effective solution to explore the ample search space of reaction mechanisms.

Reference:

[1]. Kurti, L.; Czako´, B. Strategic applications of named reactions in organic synthesis;

Elsevier, 2005.

[2]. Pistachio, https://www.nextmovesoftware.com/pistachio.html. Accessed: 20 Mar 2023.

[3]. Tu, Z.; Coley, C. W. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modelling, 2022, 62, 3503–3513.

[4]. Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Central Science, 2019, 5, 1572–1583.

[5]. Retrosynthesis-Prediction. https://github.com/kheyer/Retrosynthesis-Prediction. Accessed: 10 Feb 2024.