2025 AIChE Annual Meeting

(260d) Automated Discovery of Optimization Algorithms

Authors

Ilias Mitrai - Presenter, University of Minnesota
Nikolaos Sahinidis, Georgia Institute of Technology
Mathematical optimization is a cornerstone of process systems engineering and has been used extensively for solving decision-making problems related to process design, control, operation, and machine learning. For a given optimization problem and initial guess, iterative optimization algorithms rely on an update function to update the estimate of the solution, i.e., xi+1=g(xi) . Although significant advances have been made in developing iterative algorithms for a wide class of optimization problems [1-3], the progress is slow since it relies on intuition and domain knowledge about the optimization problem, i.e., the functional form of the objective and constraints, as well as the mathematical properties of different classes of algorithms. This intuition- and domain knowledge-driven algorithm design and discovery process limits our ability to discover new algorithms efficiently.

The automated design of iterative algorithms has been a long-standing challenge in science and engineering. Traditionally, genetic programming has been used to search for new algorithms [4,5]. However, these methods are stochastic in nature and cannot easily accommodate constraints regarding the performance of the algorithm. Machine learning techniques have also been used to either model or search for new iterative algorithms. Typical examples include deep neural networks for learning gradient descent algorithms for neural network training [6-8], reinforcement learning for matrix multiplication [8], and large language models for bin-packing problems [9]. Despite the ability of ML-based approaches to discover new algorithms, they lack certificates of optimality and are not inherently interpretable. Finally, the algorithm design/discovery problem can be posed as an optimization problem and solved using deterministic global optimization solvers. Although this approach has been used to design gradient-based methods [10-13], modeling the update function is challenging. Usually, several assumptions are made regarding the form of the update function. Overall, existing approaches have limitations related to interpretability (for the ML-based approaches), providing bounds on the best possible algorithms (the algorithms found are usually locally optimal), and restricted search space (the functional form of the update function is determined apriori)

In this work, we propose a novel mathematical programming-based formulation for the algorithm discovery problem. Specifically, we model the update function g as an expression tree of fixed depth. This modeling approach can represent many functional forms for the update function, i.e., a superstructure of candidate algorithms using simple mathematical operators (addition, multiplication, division, subtraction). This formulation allows the search for new update functions that satisfy a given convergence criterion while a desired objective, such as the complexity of the update function, is optimized. The resulting problem is a Mixed Integer Nonlinear Programming problem, which can be solved with state-of-the-art global optimization solvers such as BARON. We use the proposed approach to discover new first-order optimization algorithms for unconstrained optimization problems. First, for a single objective function and multiple initial guesses, the steepest gradient descent is returned as the optimal algorithm when the depth of the expression tree is one. When a more complex update is permitted with a tree depth of two, the optimal algorithm is a variation of the steepest gradient descent. Also, we consider the case where the same iterative algorithm will be used to solve multiple optimization problems starting from multiple initial guesses. In this case, if the depth of the update function is one, then the steepest gradient descent is the optimal algorithm, whereas if the depth of the tree is three a new algorithm is returned. Overall, the results highlight the ability of the proposed formulation to discover new algorithms for different convergence criteria and algorithm performance measures.

References:

[1]. Kingma, D.P. and Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

[2]. Wächter, A. and Biegler, L.T., 2006. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical programming, 106, pp.25-57.

[3]. Kim, D. and Fessler, J.A., 2016. Optimized first-order methods for smooth convex minimization. Mathematical programming, 159, pp.81-107.

[4]. Chen, X., Liang, C., Huang, D., Real, E., Wang, K., Pham, H., Dong, X., Luong, T., Hsieh, C.J., Lu, Y. and Le, Q.V., 2024. Symbolic discovery of optimization algorithms. Advances in Neural Information Processing Systems, 36.

[5]. Acevedo, N., Rey, C., Contreras-Bolton, C. and Parada, V., 2020. Automatic design of specialized algorithms for the binary knapsack problem. Expert Systems with Applications, 141, p.112908.

[6] Schmidhuber, J., “Learning to control fast-weight memories: An alternative to dynamic recurrent networks,” Neural Computation, vol. 4, no. 1, pp. 131–139, 1992.

[7] Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B. and De Freitas, N., 2016. Learning to learn by gradient descent by gradient descent. Advances in neural information processing systems, 29..

[8]. Fawzi, A., Balog, M., Huang, A., Hubert, T., Romera-Paredes, B., Barekatain, M., Novikov, A., R Ruiz, F.J., Schrittwieser, J., Swirszcz, G. and Silver, D., 2022. Discovering faster matrix multiplication algorithms with reinforcement learning. Nature, 610(7930), pp.47-53.

[9]. Romera-Paredes, B., Barekatain, M., Novikov, A., Balog, M., Kumar, M.P., Dupont, E., Ruiz, F.J., Ellenberg, J.S., Wang, P., Fawzi, O. and Kohli, P., 2024. Mathematical discoveries from program search with large language models. Nature, 625(7995), pp.468-475.

[10]. Drori, Y. and Teboulle, M., 2014. Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming, 145(1), pp.451-482.

[11]. Mitsos, A., Najman, J. and Kevrekidis, I.G., 2018. Optimal deterministic algorithm generation. Journal of Global Optimization, 71, pp.891-913.

[12]. Kim, D. and Fessler, J.A., 2021. Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. Journal of Optimization Theory and Applications, 188(1), pp.192-219.

[13]. Grimmer, B., 2023. Provably faster gradient descent via long steps. arXiv preprint arXiv:2307.06324.