2025 AIChE Annual Meeting
(260e) GPU-Parallel Lower Bounding with Interval Arithmetic for Global Optimization
Authors
Instead of processing multiple B&B nodes in parallel, we propose to parallelize the lower bounding solver to avoid potential warp divergence and a priori increase in the size of the B&B tree. In our proposal, the domain of a B&B node is only temporarily divided into a large number of smaller on which interval bounds of the objective functions and constraints are calculated in parallel on the GPU using interval arithmetics [8-10]. Among these interval bounds, the widest ones are selected as the overall bounds of the whole B&B node. Because the interval bounds get tighter if they are constructed over smaller domains, the bounds from our method are much tighter than those obtained by applying interval arithmetics directly over the whole B&B node – but they remain fast to compute thanks to massive parallelism offered by the GPU. We implemented this GPU-parallel lower bounding method within our open-source solver MAiNGO [11], where the GPU parts were implemented using CUDA [12] in two different manners: a. wrapping all the GPU code into one kernel function, and b. distributing the GPU code onto a CUDA graph object of which each node can be either a kernel function or a memory communication operation.
We present multiple case studies to test the performance of the proposed method. The results illustrate that using more subintervals can indeed produce tighter bounds on a given B&B node and thus result in less B&B iterations. In terms of wall-clock time, our GPU-parallelization reaches speed-ups up to 4000× over the of using standard interval arithmetics in serial on the CPU. Regarding the different implementations, the one based on the CUDA graph attains higher efficiency when the optimization problems become more complicated. For some case studies involving artificial neural network surrogate models, the presented method outperforms the default bounding method in MAiNGO which is based on McCormick relaxations [13-15] evaluated in serial on the CPU, in terms of tightness of bounds (thanks to the evaluation over multiple subintervals), B&B iterations, and wall clock time. Overall, the empirical results demonstrate the promising potential of GPU-parallel lower bounding solver to accelerate B&B methods.
Acknowledgments
Hongzhen Zhang and Dominik Bongartz gratefully acknowledge funding through Internal Funds KU Leuven (STG/22/060).
References
- Locatelli, M., & Schoen, F. (Eds.). (2013). Global optimization: theory, algorithms, and applications. Society for Industrial and Applied Mathematics.
- Pardalos, P. M. (1989). Parallel search algorithms in global optimization. Applied Mathematics and Computation, 29(3), 219-229.
- Gendron, B., & Crainic, T. G. (1994). Parallel branch-and-branch algorithms: Survey and synthesis. Operations research, 42(6), 1042-1066.
- Bader, D. A., Hart, W. E., & Phillips, C. A. (2005). Parallel algorithm design for branch and bound. Tutorials on Emerging Methodologies and Applications in Operations Research: Presented at INFORMS 2004, Denver, CO, 5-1.
- Archibald, B., Maier, P., McCreesh, C., Stewart, R., & Trinder, P. (2018). Replicable parallel branch and bound search. Journal of Parallel and Distributed Computing, 113, 92-114.
- Vu, T. T., & Derbel, B. (2016). Parallel Branch-and-Bound in multi-core multi-CPU multi-GPU heterogeneous environments. Future Generation Computer Systems, 56, 95-109.
- Gottlieb, R. X., Xu, P., & Stuber, M. D. (2024). Automatic source code generation for deterministic global optimization with parallel architectures. Optimization Methods and Software, 1–39. https://doi.org/10.1080/10556788.2024.2396297
- Hansen, E. (1980). Global optimization using interval analysis—the multi-dimensional case. Numerische Mathematik, 34(3), 247-270.
- Baumann, E. (1988). Optimal centered forms. BIT Numerical Mathematics, 28(1), 80-87.
- Kulisch, U. W. (2009, April). Complete interval arithmetic and its implementation on the computer. In Numerical Validation in Current Hardware Architectures: International Dagstuhl Seminar, Dagstuhl Castle, Germany, January 6-11, 2008. Revised Papers (pp. 7-26). Berlin, Heidelberg: Springer Berlin Heidelberg.
- Bongartz, D., Najman, J., Sass, S., & Mitsos, A. (2018). MAiNGO - McCormick-based Algorithm for mixed-integer Nonlinear Global Optimization. Technical report, Process Systems Engineering (AVT.SVT), RWTH Aachen University. http://permalink.avt.rwth-aachen.de/?id=729717
- NVIDIA. (2024). CUDA Toolkit 12.4. Retrieved from https://developer.nvidia.com/cuda-toolkit
- McCormick, G. P. (1976). Computability of global solutions to factorable nonconvex programs: Part I—Convex underestimating problems. Mathematical programming, 10(1), 147-175.
- Mitsos, A., Chachuat, B., & Barton, P. I. (2009). McCormick-based relaxations of algorithms. SIAM Journal on Optimization, 20(2), 573-601.
- Beckers, M., Mosenkis, V., & Naumann, U. (2012). Adjoint mode computation of subgradients for McCormick relaxations. In Recent advances in algorithmic differentiation (pp. 103-113). Springer Berlin Heidelberg.