2025 AIChE Annual Meeting
(392al) Molecular Design Using Graph Bayesian Optimization with Shortest-Path Kernels
Authors
Alternatively, molecular generation can be viewed as a constrained graph optimization problem, i.e., finding desired graph structures and features subject to given constraints. Considering the black-box objective and limited data, recent advances in graph BO are promising alternatives. Existing graph BO approaches are successful in fitting graph functions using Gaussian processes (GPs) with various graph kernels [Ramachandram et al., 2017, Borovitskiy et al., 2021, Ru et al., 2021, Zhi et al., 2023], but are generally incompatible with molecule settings where individual data point is an undirected labeled graph with features. Additionally, the optimization search space for acquisition functions in graph BO is large and difficult to be efficiently explored by evolutionary or sampling algorithms.
This work introduces Bayesian optimization over graphs with shortest-path encoded, or BoGrape [Xie et al., 2025], as a paradigm for data-driven optimization over molecules. GPs are chosen as the surrogate models, using global acquisition function optimization methods introduced in our recent work [Xie et al., 2024]. We present four variants of the classic shortest-path graph kernel [Borgwardt and Kriegel, 2005] for measuring similarities between graphs in BoGrape. By precisely representing the shortest paths as decision variables, the acquisition function optimization is formulated as a mixed-integer program (MIP) with a mixed-feature search space, graph kernel, and relevant problem-specific constraints. Numerical results on QM7 and QM9 datasets highlight the performance and sample efficiency of BoGrape in generating good candidate molecules.
References:
K. M. Borgwardt and H.-P. Kriegel. Shortest-path kernels on graphs. In International Conference on Data Mining, 2005.
V. Borovitskiy, I. Azangulov, A. Terenin, P. Mostowsky, M. P. Deisenroth, and N. Durrande. Matern Gaussian processes on graphs. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2021.
P. I. Frazier. A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811, 2018.
T. Gaudelet, B. Day, A. R. Jamasb, J. Soman, C. Regep, G. Liu, J. B. Hayter, R. Vickers, C. Roberts, J. Tang, et al. Utilizing graph machine learning within drug discovery and development. Briefings in Bioinformatics, 22(6):bbab159, 2021.
W. Jin, R. Barzilay, and T. Jaakkola. Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning (ICML), 2018.
N. Maus, H. Jones, J. Moore, M. J. Kusner, J. Bradshaw, and J. Gardner. Local latent space Bayesian optimization over structured inputs. In International Conference on Neural Information Processing Systems (NeurIPS), 2022.
D. Ramachandram, M. Lisicki, T. J. Shields, M. R. Amer, and G. W. Taylor. Structure optimization for deep multimodal fusion networks using graph-induced kernels. arXiv preprint arXiv:1707.00750, 2017.
J. G. Rittig, M. Ritzert, A. M. Schweidtmann, S. Winkler, J. M. Weber, P. Morsch, K. A. Heufer, M. Grohe, A. Mitsos, and M. Dahmen. Graph machine learning for design of high-octane fuels. AIChE Journal, page e17971, 2022.
B. Ru, X. Wan, X. Dong, and M. Osborne. Interpretable neural architecture search via Bayesian optimisation with Weisfeiler-Lehman kernels. In International Conference on Learning Representations (ICLR), 2021.
X. Xia, J. Hu, Y. Wang, L. Zhang, and Z. Liu. Graph-based generative models for de Novo drug design. Drug Discovery Today: Technologies, 32:45–53, 2019.
Y. Xie, S. Zhang, J. Paulson, and C. Tsay. Global optimization of Gaussian process acquisition functions using a piecewise-linear kernel approximation. arXiv preprint arXiv:2410.16893, 2024.
Y. Xie, S. Zhang, J. Qing, R. Misener, and C. Tsay. BoGrape: Bayesian optimization over graphs with shortest-path encoded. arXiv preprint arXiv:2503.05642, 2025.
J. Xiong, Z. Xiong, K. Chen, H. Jiang, and M. Zheng. Graph neural networks for automated de novo drug design. Drug Discovery Today, 26(6):1382–1393, 2021.
Y.-C. Zhi, Y. C. Ng, and X. Dong. Gaussian processes on graphs via spectral kernel learning. IEEE Transactions on Signal and Information Processing over Networks, 2023.