2022 Annual Meeting
(152f) Capturing Molecular Interactions in Graph Neural Networks: A Case Study in Multi-Component Phase Equilibrium
Authors
In a general GNN-based approach for molecular property prediction, atom and bond features are propagated based on the molecular structure for a single molecule input. The embedded features are then sent to fully-connected layers to construct predictive models [14]. When dealing with multiple components, several attempts have been made. The typical method is to average or concatenate the embedded features of individual molecules and use them as the system-level features for property inference with fully-connected or attentive layers [6,7,8]. Previous studies have also incorporated weighted sums or concatenation to take into account the composition information when needed [6]. However, these approaches have not captured intra- and inter- molecular interactions in an explicit manner.
In this work, we present a GNN architecture to incorporate both intra- and inter- molecular interactions via the combination of atomic-level (local) graph convolution and molecular-level (global) message passing for property prediction of multi-component chemical systems. To connect local features with global features, we constructed a molecular interaction network as the intermediate step. The molecular interaction network is a complete graph with each composition-weighted node representing a molecule and each edge representing a hypothetical inter-molecular interaction, such as hydrogen bonding information. It serves as a physics-informed topological prior to aid feature extraction from multi-component systems. Here, we tested the proposed GNN architecture through a case study on activity coefficient predictions of multi-component systems. We also provided a framework that can intake a given mixture (binary or ternary) and generate the corresponding phase diagrams (P-x-y) using the trained GNN along with thermodynamic calculations. We also performed counter-factual analysis [15] of the trained model to identify the impact of functional groups on activity coefficients to obtain physical insights.
References
[1] Sanchez-Lengeling, B., & Aspuru-Guzik, A. (2018). Inverse molecular design using machine learning: Generative models for matter engineering. Science, 361(6400), 360-365.
[2] Rogers, D., & Hahn, M. (2010). Extended-connectivity fingerprints. Journal of chemical information and modeling, 50(5), 742-754.
[3] Karelson, M., Lobanov, V. S., & Katritzky, A. R. (1996). Quantum-chemical descriptors in QSAR/QSPR studies. Chemical reviews, 96(3), 1027-1044.
[4] Natarajan, A. R., & Van der Ven, A. (2018). Machine-learning the configurational energy of multicomponent crystalline solids. npj Computational Materials, 4(1), 1-7.
[5] Wilbraham, L., Sprick, R. S., Jelfs, K. E., & Zwijnenburg, M. A. (2019). Mapping binary copolymer property space with neural networks. Chemical science, 10(19), 4973-4984.
[6] Hanaoka, K. (2020). Deep neural networks for multicomponent molecular systems. ACS omega, 5(33), 21042-21053.
[7] Wei, J. N., Duvenaud, D., & Aspuru-Guzik, A. (2016). Neural networks for the prediction of organic chemistry reactions. ACS central science, 2(10), 725-732.
[8] Coley, C. W., Jin, W., Rogers, L., Jamison, T. F., Jaakkola, T. S., Green, W. H., ... & Jensen, K. F. (2019). A graph-convolutional neural network model for the prediction of chemical reactivity. Chemical science, 10(2), 370-377.
[9] Pan, Y., Ji, X., Ding, L., & Jiang, J. (2019). Prediction of lower flammability limits for binary hydrocarbon gases by quantitative structureâproperty relationship approach. Molecules, 24(4), 748.
[10] Wang, T., Tang, L., Luan, F., & Cordeiro, M. N. D. (2018). Prediction of the toxicity of binary mixtures by QSAR approach using the hypothetical descriptors. International journal of molecular sciences, 19(11), 3423.
[11] Chinta, S., & Rengaswamy, R. (2019). Machine learning derived quantitative structure property relationship (QSPR) to predict drug solubility in binary solvent systems. Industrial & Engineering Chemistry Research, 58(8), 3082-3092.
[12] Jirasek, F., Alves, R. A., Damay, J., Vandermeulen, R. A., Bamler, R., Bortz, M., ... & Hasse, H. (2020). Machine learning in thermodynamics: Prediction of activity coefficients by matrix completion. The Journal of Physical Chemistry Letters, 11(3), 981-985.
[13] Wu, Z., Ramsundar, B., Feinberg, E. N., Gomes, J., Geniesse, C., Pappu, A. S., ... & Pande, V. (2018). MoleculeNet: a benchmark for molecular machine learning. Chemical science, 9(2), 513-530.
[14] Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., ... & Sun, M. (2020). Graph neural networks: A review of methods and applications. AI Open, 1, 57-81.
[15] Wellawatte, G. P., Seshadri, A., & White, A. D. (2022). Model agnostic generation of counterfactual explanations for molecules. Chemical Science.
