2025 AIChE Annual Meeting

(202f) Automatic Chemical Information Extraction Using Deep Learning and Large Language Models: Applications in Chemistry and Materials Predictions

Checkout Do you already own this? Log in to access this content.

Pricing

Individuals

AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	225.00
Non-Members	225.00

Authors

Hanyu Gao - Presenter

Yufan Chen, Hong Kong University of Science and Technology

Haifan Zhou

Yuxuan Zhang, China Agricultural University

Machine learning (ML) is transforming molecular sciences by accelerating the discovery and optimization of molecules, materials, and reactions. However, the performance of ML models heavily relies on the availability of high-quality, large-scale datasets. Historically, constructing such datasets in chemistry and materials science has required labor-intensive manual curation due to the heterogeneous distribution of data across publications and the complexity of domain-specific formats. As ML applications in these fields expand rapidly, reliance on manual data extraction has emerged as a critical bottleneck.

Automated data extraction from scientific literature offers a promising solution, particularly with recent advances in large language models (LLMs). Yet, chemistry and materials science present unique challenges, including complex named entities, long-range dependencies, and multimodal data (e.g., text, tables, and figures). In this work, we present deep learning and LLM-based approaches for automated information extraction from organic chemistry and polymer materials literature. By leveraging data augmentation, prompt engineering, and fine-tuning, we develop efficient and accurate models for structured knowledge extraction. Notably, the automatically extracted data achieves comparable performance in downstream ML tasks to expert-curated datasets. Our results demonstrate the potential of fully automated, data-driven pipelines to accelerate discovery in chemistry and materials science.

Breadcrumb

2025 AIChE Annual Meeting

(202f) Automatic Chemical Information Extraction Using Deep Learning and Large Language Models: Applications in Chemistry and Materials Predictions

Authors