2025 AIChE Annual Meeting

(675e) Text Mining Experimental Heterogeneous Catalysis Literature with Large Language Models

Checkout Do you already own this? Log in to access this content.

Pricing

Individuals

AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	225.00
Non-Members	225.00

Authors

Benjamin Walls - Presenter, Rice University

Suljo Linic, University of Michigan-Ann Arbor

Extracting experimentally measured heterogeneous catalysis data from research articles into structured databases would facilitate the rapid screening of catalysts with target properties and development of machine learning models that can directly predict experimental outcomes. This text mining task has been transformed by the release of large language models (LLMs) capable of following general natural language instructions, which have made it possible to mine text without the need to train task-specific models or define comprehensive expression-matching rules. Here, we present a text mining tool we developed called CatMiner that extracts arbitrary user-specified catalytic structure–environment–property data using LLMs. It is agnostic to LLM choice with both closed-source GPT and open-source Llama and Deepseek models supported with no modification. We benchmark the ability of CatMiner to extract data on the oxidative coupling of methane (OCM) reaction and measure the effect that different LLMs and prompting strategies have on performance. Using Llama 3.1 405B, we achieve an F₁-score of 80.3% on a catalyst–property extraction task and 68.7% on a more difficult catalyst–temperature–property extraction task. We find that incorporating domain knowledge, chat-like memory, follow-up prompting, and inter-paragraph search capabilities are all necessary to achieve best performance. Using CatMiner, we generate a machine-readable database of 3628 OCM measurements extracted from 1029 papers and abstracts.

Breadcrumb

2025 AIChE Annual Meeting

(675e) Text Mining Experimental Heterogeneous Catalysis Literature with Large Language Models

Authors