2024 AIChE Annual Meeting

(169cn) Accelerating Drug Discovery through the Automatic Population of a Pharmaceutical Ontology Using Knowledge Graphs

Authors

Chakraborty, A., Columbia University In the City of New York
Venkatasubramanian, V., Columbia University
In the process of drug discovery and manufacturing, experts need to scan through hundreds to thousands of documents to obtain relevant information. One way of overcoming the challenges posed by this process is the use of ontologies, which model domain knowledge hierarchically. Currently, the field lacks domain-specific, automatically populated ontologies that store large amounts of data. To address this, we developed SUSIE, an ontology-based pharmaceutical information extraction tool built to extract semantic triples1. These are presented to the user in the form of knowledge graphs (KGs). The Columbia Ontology of Pharmaceutical Engineering (COPE) is used to extract the semantic triples, but the KGs do not match its hierarchical structure2. We present a novel framework for automatically populating the sub-classes of COPE. By its hierarchical nature, this also allows us to infer the other classes that contain the extracted terms. In addition to expanding SUSIE’s training base, fine-tuning the entity recognition process, and expanding the ontology classes, automatic mapping of KG nodes and edges to COPE allowed us to create a database of logically organized pharmaceutical terms, which aims to streamline the querying process and accelerate the drug discovery and manufacturing pipeline.

Bibliography:

(1) Mann V., Viswanath S., Vaidyaraman S., Balakrishnan J., Venkatasubramanian V., (2023).

SUSIE: Pharmaceutical CMC ontology-based information extraction for drug development using machine learning,

Computers & Chemical Engineering, Volume 179, 108446.

https://doi.org/10.1016/j.compchemeng.2023.108446.

(2) Remolona M. F. M., Conway M. F. , Balasubramanian S., Fan L., Feng Z., Gu T., Kim H., Nirantar P. M. , Panda S., Ranabothu N. R. , Rastogi N., Venkatasubramanian V., (2017).

Hybrid ontology-learning materials engineering system for pharmaceutical products: Multi-label entity recognition and concept detection.

Computers & Chemical Engineering, Volume 107, Pages 49-60.

https://doi.org/10.1016/j.compchemeng.2017.03.012.