2021 Annual Meeting

(364c) Transfer Learning for Prediction of Absorption and Emission Spectra from Multi-Fidelity Data

Authors

Greenman, K. P. - Presenter, Massachusetts Institute of Technology
Green, W., Massachusetts Institute of Technology
Gomez-Bombarelli, R., Massachusetts Institute of Technology
Accurate predictions of optical properties from chemical structure are necessary for the design of optimal dye molecules towards important applications such as solar cells, bioimaging, display technologies, etc. The recent compilation of several large datasets of experimental absorption and emission spectral data from the literature has enabled machine learning methods to equal or exceed the prediction accuracy of first principles approaches like time-dependent density functional theory (TD-DFT) at a fraction of the computational cost. Depending on the choice of exchange-correlation functional and the level of approximation, TD-DFT may show systematic and random errors with respect to the experimental ground truth. Nevertheless, so-called multi-fidelity statistical models can still benefit from TD-DFT as a physics-based regularization to improve generalization to unseen chemistries. TD-DFT calculations still remain orders of magnitude cheaper and more easily automated than experiments, which have a much higher opportunity cost for false positive predictions.This means that the amount of TD-DFT data available, and the ability to acquire more, still far exceeds that from experiments. In this work, we benchmark several approaches for leveraging multi-fidelity combined computational and experimental datasets to predict the optical properties of dye molecules with high transferability to unseen chemical domains. We compare pre-training and transfer learning strategies across several datasets for multiple prediction tasks. Our findings improve the accuracy and generalizability of the machine learning models, enabling researchers to use them more effectively in generative modeling and molecular design.