2025 AIChE Annual Meeting
(584aw) A Scalable Dataset of Organic Reaction Mechanisms Constructed Using Templated-Based Breadth-First Mechanism Intermediate Search
Authors
In this work, we constructed a new template-based reaction mechanism dataset construction framework. Our approach enumerates all possible mechanisms within a curated set of elementary reaction steps in SMILES Arbitrary Target Specification (SMARTS) to encourage a more explorative reaction mechanism search. We curated 1769 templates from a graduate textbook[1], and we used these templates to reproduce reaction mechanism pathways for reactions in Pistachio[2], a database that recorded patented reactions. In total, we reproduced reaction mechanisms for 21,544 reactions with 45,518 mechanistic steps. We subsequently evaluated this dataset by training three different reaction outcome prediction models by conducting mechanism prediction[3][4][5]. The best-performing model achieved 85.50% top-1 accuracy in single-step prediction and 76.88% top-1 accuracy in multistep reaction mechanism prediction. Our model showed capability in exploring less studied reaction mechanisms, which shows our proposed framework an effective solution to explore the ample search space of reaction mechanisms.
Reference:
[1]. Kurti, L.; Czako´, B. Strategic applications of named reactions in organic synthesis;
Elsevier, 2005.
[2]. Pistachio, https://www.nextmovesoftware.com/pistachio.html. Accessed: 20 Mar 2023.
[3]. Tu, Z.; Coley, C. W. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modelling, 2022, 62, 3503–3513.
[4]. Schwaller, P.; Laino, T.; Gaudin, T.; Bolgar, P.; Hunter, C. A.; Bekas, C.; Lee, A. A. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS Central Science, 2019, 5, 1572–1583.
[5]. Retrosynthesis-Prediction. https://github.com/kheyer/Retrosynthesis-Prediction. Accessed: 10 Feb 2024.