2016 AIChE Annual Meeting

(681c) Discriminative Neural Embeddings of Latent Variable Models for Molecular Property Prediction

Checkout Do you already own this? Log in to access this content.

Pricing

Individuals

AIChE Pro Members	150.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	225.00
Non-Members	225.00

Authors

Le Song - Presenter, Georgia Institute of Technology

Hanjun Dai, Georgia Institute of Technology

Bo Dai, Georgia Institute of Technology

Kernel classifiers and regressors designed for structured data, such as sequences, trees and graphs, have significantly advanced in a number of interdisciplinary areas such as computational biology and dug design. Typically, the kernel functions are designed beforehand for a data type which either exploit statistics of the structures or make use of probabilistic generative models, and then a discriminative classifier is learned based on the kernels via convex optimization. However, such an elegant two-stage approach also limited kernel methods in terms of their ability to scale up to millions of data points and exploit discriminative information to learn feature representations.

We propose an effective and scalable approach for structured data representation which is based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces with ultimate regressor/classifier using discriminative information. The algorithm runs a sequence of function mappings in a way similar to graphical model inference procedures, such as mean field and belief propagation. This can be implemented as a recurrent neural network, where the parameters for the feature spaces and classifiers are trained in an end to end fashion. We deployed our algorithm on several computational chemistry problems, including the compound and protein classification tasks. We achieved state-of-the-art results on several commonly used benchmark datasets, including NCI1, NCI90, ENZYMES and D&D. We also applied our algorithm on Harvard Clean Energy Project dataset with millions of molecules and predicted power conversion efficiency and energy. We achieved below 0.1 mean absolute error, while using significantly small number of parameters than alternatives.

Breadcrumb

2016 AIChE Annual Meeting

(681c) Discriminative Neural Embeddings of Latent Variable Models for Molecular Property Prediction

Authors