2022 Annual Meeting
Prediction of the HOMO-LUMO Gap Energy of Organic Molecules with Graph Neural Networks
In this work, we examined the effectiveness of ML for predicting the HOMO-LUMO band gap energy, a task which has previously been identified as being of particular importance due to its relevance in properties including reactivity, photoexcitation, and charge transport. We studied two methodologies for molecule representation as inputs to our learning model. In the first, molecules are represented as two-dimensional graphs where atoms are graph nodes and bonds are graph edges. In the second, we generate a representative 3-dimensional conformation of the molecule and augment the graph with atom positions and relative distances. We trained and tested both models using the PCQM4Mv2 dataset, a collection of data for over 3.7 million compounds published as part of the Open Graph Benchmark (OGB) Large Scale Challenge (LSC), an open competition for the fair benchmarking of graph-based machine learning algorithms. We find that molecule representations learned by graph neural networks outperform those based on domain expert knowledge, and that including 3-dimensional structure information further improves prediction accuracy. Based on these results, we plan to submit our best model to the OGB-LSC competition, which will announce results this November.