2025 AIChE Annual Meeting

(448e) Generalized Molecular Property Imputation Using a Flexible Transformer Architecture

Checkout Do you already own this? Log in to access this content.

Pricing

Individuals

AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	225.00
Non-Members	225.00

Authors

Andrew Schofield - Presenter

Tianfan Jin

Ericka Miller, University of Notre Dame

Brett Savoie, Purdue University

Chemical data is fundamentally sparse, as molecular structures can serve as database keys for countless properties. Ideally, it would be possible to convert between databases with different properties for each molecule, or to fill in missing properties based on those that are available, or to fuse databases with partially overlapping properties. However, classical data imputation strategies based on primitive interpolation or structural regressors fail at these tasks. Even in predicting a single property, there is typically additional known property information that is neglected or an incomplete structure that makes traditional methods unwise or inapplicable. Here, we present a more general paradigm of chemical property imputation that uses all available information in imputation, fusion, and conversion tasks. A robust transformer model architecture was developed for these generalized imputation tasks. We examine these capabilities in multiple trials using a dataset of approximately 16M organic molecules and 23 properties. Finally, we proffer an imputation protocol with the same architecture to impute a sparse dataset using only the data contained therein. The suitability of this protocol for general imputation is demonstrated by two case studies in which sparse data is imputed with an average R² values of 0.79 and 0.85. These advances should herald more general models and strengthen our collective understanding of the relationships between molecules and their properties.

Breadcrumb

2025 AIChE Annual Meeting

(448e) Generalized Molecular Property Imputation Using a Flexible Transformer Architecture

Authors