In absence of experimental data, the ability to predict various thermophysical properties related to chemicals of interest becomes crucial in many engineering applications. Among these applications are phase-equilibria calculations, energy balances, and the evaluation of process alternatives [1]. Quantitative Structure-Property Relationship models or âQSPRâ models enable such predictions by relating the molecular structure in a machine-readable format (molecular descriptors) to the property of interest through a mathematical model [2].
Recent developments in the field of Deep Learning (DL) and especially Graph Neural Networks (GNN) have eliminated the tedious task of choosing a suitable molecular descriptor for the task at hand, as they can learn an optimal representation from a molecular graph and map them to the target property through backpropagation [3], [4]. Traditionally, models are built to predict one specific property or target. However, DL models can predict several properties simultaneously also known as multitask learning [5]. Here the models might improve their performance through inductive transfer learning i.e. while learning to predict property âAâ, the model might need less effort to learn how to predict property âBâ and might also be able to transfer the newly gained knowledge into the new domain (task) [5]. This is especially relevant in cases where good quality experimental data are scarce [6]. However, an improvement of the model's predictive prowess is not always the case and ânegativeâ transfer might occur [7].
In this work, we will demonstrate that domain knowledge plays an important role in ensuring âpositiveâ learning when dealing with the multi-task prediction of molecular properties by showcasing two case studies: 1) âseeminglyâ non-related properties in the form of the Gibbs free energy of formation and the acentric factor 2) theoretically related properties in the form of the critical temperature and the acentric factor. For each property, a single task GNN-based model will be developed to serve as a benchmark model to illustrate the potential improvement when applying multi-task transfer learning. The comparative assessment between the two case studies will demonstrate that although AI-based techniques and tools might offer improved results compared to conventional modeling techniques, the âchemist-in-the-loopâ [8] is an indispensable element in building and improving AI-based property prediction models.
References
[1] J. Frutiger, I. Bell, J. P. OâConnell, K. Kroenlein, J. Abildskov, and G. Sin, âUncertainty assessment of equations of state with application to an organic Rankine cycleâ ,â Mol. Phys., vol. 115, no. 9â12, pp. 1225â1244, 2017.
[2] N. D. Austin, N. V. Sahinidis, and D. W. Trahan, âComputer-aided molecular design: An introduction and review of tools, applications, and solution techniques,â Chem. Eng. Res. Des., vol. 116, pp. 2â26, 2016.
[3] C. W. Coley, R. Barzilay, W. H. Green, T. S. Jaakkola, and K. F. Jensen, âConvolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction,â J. Chem. Inf. Model., vol. 57, no. 8, pp. 1757â1772, Aug. 2017.
[4] K. Yang et al., âAnalyzing Learned Molecular Representations for Property Prediction,â J. Chem. Inf. Model., vol. 59, no. 8, pp. 3370â3388, 2019.
[5] S. Ruder, âAn Overview of Multi-Task Learning in Deep Neural Networks,â arXiv, no. May, Jun. 2017.
[6] A. M. Schweidtmann, J. G. Rittig, A. König, M. Grohe, A. Mitsos, and M. Dahmen, âGraph Neural Networks for Prediction of Fuel Ignition Quality,â Energy & Fuels, vol. 34, no. 9, pp. 11395â11407, Sep. 2020.
[7] W. Zhang, L. Deng, L. Zhang, and D. Wu, âOvercoming Negative Transfer: A Survey,â arXiv, pp. 1â15, Sep. 2020.
[8] T. J. Wills, D. A. Polshakov, M. C. Robinson, and A. A. Lee, âImpact of Chemist-In-The-Loop Molecular Representations on Machine Learning Outcomes,â J. Chem. Inf. Model., vol. 60, no. 10, pp. 4449â4456, Oct. 2020.