2025 AIChE Annual Meeting

(448a) Enhancing Data Interoperability in Quantum Chemistry with Ontologies and Natural Language Interfaces

Authors

Markus Kraft - Presenter, Uiv of Cambridge
Lan Lan, Stanford
Jackson Burns, University of Delaware
William Green, Massachusetts Institute of Technology
Feroz Farazi, University Of Cambridge
Sebastian Mosbach, University of Cambridge
Jethro Akroyd, Computational Modelling Cambridge Ltd.
Quantum chemistry calculations generate extensive data, yet these data sets often remain distributed, heterogeneous, and challenging to access systematically. One notable example is QuantumPioneer (QP), developed by MIT, which has already been utilized in constructing various machine learning models. However, such datasets typically require substantial technical and domain expertise to use effectively, and interoperability remains a major challenge, as each database adheres to its own structure and conventions, making integration and consolidation exceedingly difficult. Although significant efforts have been made to consolidate computational chemistry data, existing ontologies such as OntoCompChem – developed within The World Avatar (TWA) project remain incomplete.

In this work, we enhance OntoCompChem to integrate data from QuantumPioneer, significantly expanding its scope and usability. By extending the OntoCompChem ontology and data ingestion workflows, we enable comprehensive, structured access to quantum chemistry data via a SPARQL endpoint. Furthermore, we integrate these data sets into the Marie question-answering (QA) system, enabling intuitive, natural-language querying of quantum computational data. Our work facilitates broader access to quantum chemical data, supports machine learning applications, and lays the groundwork for future computational discovery through improved interoperability and semantic data representation.

Keywords: Quantum Chemistry Data, Ontologies, Natural Language Processing (NLP)