2025 AIChE Annual Meeting

(391h) Integrating Ontologies with Large Language Models for Enhanced Control Systems in Chemical Engineering

Authors

Crystal Su - Presenter, Columbia University
Mingyuan Shao - Presenter, Columbia University
Kuai Yu, Columbia University
Jingrui Zhang, University of Wisconsin, Madison
Daniel Bauer, Columbia University
Chemical engineering students and practitioners often ask questions using informal, non-standard language, complicating the accurate retrieval of precise, ontology-based technical information. Traditional keyword-based retrieval systems frequently fail to accurately map these informal queries to structured domain knowledge, leading to inaccuracies and inefficiencies. This research introduces a pioneering approach that integrates advanced large language models (LLMs), such as GPT-3.5, with structured chemical engineering ontologies within Neo4j graph databases to accurately interpret and answer complex domain-specific queries expressed in informal language. This synergy uniquely leverages the semantic intuition and linguistic adaptability of LLMs combined with the rigor and precision of curated chemical engineering knowledge graphs. The developed framework begins by employing LLM-driven entity recognition to interpret and extract chemical engineering entities from natural-language queries. Extracted entities are precisely linked to ontology terms using enhanced lexical similarity and synonym-based mapping. Contextual knowledge retrieved from the ontology then informs a sophisticated retrieval-augmented generation (RAG) process, where the LLM generates context-grounded, evidence-backed answers with explicit ontology citations. This method ensures transparency, accountability, and factual accuracy. The framework significantly advances chemical engineering knowledge retrieval by dramatically reducing common LLM inaccuracies, such as hallucinations or misinformation, through stringent ontology-backed validation. This approach ensures responses are reliable, traceable, and aligned with established chemical engineering standards, profoundly enhancing educational outcomes and operational safety practices. Recognizing existing limitations, such as incomplete synonym recognition and the absence of embedding-based retrieval, ongoing developments include integrating comprehensive synonym lexicons, ontology-informed embedding techniques, and advanced NLP tools (e.g., SciSpaCy, domain-specific BERT entity linkers). Future expansions aim to incorporate multimodal data, iterative validation loops, and hybrid knowledge graph/text retrieval to further strengthen the accuracy, comprehensiveness, and robustness of the question-answering system. This work aligns closely with themes in Computing and Systems Engineering, as it applies advanced computational methods to process understanding and semantic mapping. Its relevance to Process Design & Development lies in its potential to support engineers in synthesizing insights from complex technical data. Additionally, it contributes to Instrumentation by improving interpretability and diagnostics in control environments through ontology-grounded AI assistance.