2025 AIChE Annual Meeting

(66l) Towards a Comprehensive Reaction Database for Organometallics

Authors

Zhao Li - Presenter, Northwestern University
Brett Savoie, Purdue University
Currently, several specialized databases exist that target transition metal chemistry, including the Transition Metal Quantum Mechanics (TMQM) database,1 which primarily catalogs molecular structures, and the tmCAT/tmPHOTO/tmBIO/tmSCO databases,2 which focus largely on application-driven property predictions. However, despite these valuable resources, there is no comprehensive database specifically dedicated to capturing detailed transition metal reaction pathways, reaction products, or transition states. This gap limits the systematic exploration and rational design of transition metal-catalyzed reactions, crucial for advancing both fundamental understanding and practical catalytic processes.

To address this critical gap, our group is developing a comprehensive database of transition metal reactions by employing two complementary methodologies. First, we utilize the computational platform, Yet Another Reaction Program (YARP),3 to systematically explore reaction networks through bond-forming and bond-breaking events. YARP enables the automated enumeration of reaction products, the prediction of reaction barriers, and the identification of reaction intermediates, thereby generating extensive reaction graphs. This computationally driven approach enables high-throughput screening and systematic investigation of reaction spaces beyond traditional experimental limitations.

In parallel, we are harnessing the capabilities of advanced large-language model (LLM) agents and natural language processing (NLP) to extract and curate reaction data directly from the extensive existing literature. After initial extraction, these literature-identified reactions are subjected to computational refinement and rigorous transition state calculations to ensure accuracy and reliability. By integrating both computational enumeration and literature-based methodologies, we are creating a robust and highly comprehensive transition metal reaction database, structurally analogous to the widely utilized Reaction Graph Depth 1 (RGD1) dataset4 for organic molecules. This combined approach offers unprecedented depth and breadth for exploring transition metal reactivity, significantly enhancing predictive capabilities and accelerating catalyst design in both homogeneous and heterogeneous catalysis.

References:

(1) Balcells, D.; Skjelstad, B. B. tmQM Dataset—Quantum Geometries and Properties of 86k Transition Metal Complexes. J. Chem. Inf. Model. 2020, 60 (12), 6135–6146. https://doi.org/10.1021/acs.jcim.0c01041.

(2) Kevlishvili, I.; Michel, R. G. S.; Garrison, A. G.; Toney, J. W.; Adamji, H.; Jia, H.; Román-Leshkov, Y.; Kulik, H. J. Leveraging Natural Language Processing to Curate the tmCAT, tmPHOTO, tmBIO, and tmSCO Datasets of Functional Transition Metal Complexes. Faraday Discuss. 2025, 256 (0), 275–303. https://doi.org/10.1039/D4FD00087K.

(3) Zhao, Q.; Savoie, B. M. Simultaneously Improving Reaction Coverage and Computational Cost in Automated Reaction Prediction Tasks. Nat Comput Sci 2021, 1 (7), 479–490. https://doi.org/10.1038/s43588-021-00101-3.

(4) Zhao, Q.; Vaddadi, S. M.; Woulfe, M.; Ogunfowora, L. A.; Garimella, S. S.; Isayev, O.; Savoie, B. M. Comprehensive Exploration of Graphically Defined Reaction Spaces. Sci Data 2023, 10 (1), 145. https://doi.org/10.1038/s41597-023-02043-z.