Feedforward neural networks or multi-layer perceptrons (MLPs) are one of the most widely used surrogate models for process optimization (1). The main reason for the popularity of MLPs arises from their property to universally approximate any input/output data given a nonlinear activation function and a sufficiently high number of neurons (2). Hence, significant research has been devoted to improving the effectiveness of (globally) optimizing over-trained multi-layer perceptrons.
For the tanh activation function, one of the most widely used activations, Schweidtmann and Mitsos employed convex and concave envelopes generated via McCormick-based relaxation approaches in reduced space to optimize over these networks effectively (3). For ReLU activations, several formulations and bounds-tightening strategies have been proposed to improve the effectiveness of optimizing over ReLU networks (4–7). Further, for other less commonly used activations, convex and concave envelopes have been proposed for faster convergence to global optimality (8,9). Despite these advances, solving large and deep networks with hundreds of neurons with multiple layers remains challenging. For instance, Schweidtmann and Mitsos reported that MLPs with tanh activations increase drastically with the number of layers (3). In a similar study conducted for ReLU networks optimized using GUROBI (v10), MLPs with six layers with 500 neurons each remain unsolvable (10).
Several applications in chemical engineering may require such large networks to approximate an input/output relationship at a high level of accuracy. In such cases, global optimization using these surrogate models might be unsuitable. Recently, a new class of machine learning models named Kolmogorov Arnold Networks (KANs) have been proposed (11). KANs are based on the Kolmogorov-Arnold representation theorem (12). Consequently, KANs potentially require fewer parameters to approximate an input/output relationship with a given accuracy relative to MLPs. Hence, in this work, we evaluate whether KANs offer a suitable alternative to MLPs for deterministic global optimization.
We consider a mixed-integer nonlinear programming formulation of a KAN and test it on standard test functions for optimization, namely, Rosenbrock and peaks function. We consider multiple cases of the Rosenbrock function with the number of inputs varying from three to ten. We also train MLPs with ReLU activation using the same dataset used for training KANs. For the ReLU activation we consider the Big-M (6), partition-based (7) and complementarity formulations (13). We compare KANs and MLPs with state-of-the-art deterministic global mixed-integer solvers, namely BARON (14), SCIP (15) and GUROBI (16). We observe that KANS are more attractive than MLPs as surrogate models for cases with less than five inputs. For cases with more than five inputs, MLPs are more attractive.
References:
- Misener R, Biegler L. Formulating data-driven surrogate models for process optimization. Comput Chem Eng. 2023 Nov 1;179:108411.
- Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Networks. 1989 Jan 1;2(5):359–66.
- Schweidtmann AM, Mitsos A. Deterministic Global Optimization with Artificial Neural Networks Embedded. J Optim Theory Appl [Internet]. 2019 Mar 15 [cited 2025 Apr 7];180(3):925–48. Available from: https://link.springer.com/article/10.1007/s10957-018-1396-0
- Fischetti M, Jo J. Deep neural networks and mixed integer linear optimization. Constraints [Internet]. 2018 Jul 1 [cited 2025 Apr 7];23(3):296–309. Available from: https://link.springer.com/article/10.1007/s10601-018-9285-6
- Anderson R, Huchette J, Ma W, Tjandraatmadja C, Vielma JP. Strong mixed-integer programming formulations for trained neural networks. Math Program [Internet]. 2018 Nov 20 [cited 2025 Apr 7];183(1–2):3–39. Available from: https://arxiv.org/abs/1811.08359v2
- Grimstad B, Andersson H. ReLU networks as surrogate models in mixed-integer linear programs. Comput Chem Eng. 2019 Dec 5;131:106580.
- Tsay C, Kronqvist J, Thebelt A, Misener R. Partition-Based Formulations for Mixed-Integer Optimization of Trained ReLU Neural Networks. Adv Neural Inf Process Syst. 2021 Dec 6;34:3068–80.
- Wilhelm ME, Wang C, Stuber MD. Convex and concave envelopes of artificial neural network activation functions for deterministic global optimization. Journal of Global Optimization [Internet]. 2023 Mar 1 [cited 2025 Apr 7];85(3):569–94. Available from: https://link.springer.com/article/10.1007/s10898-022-01228-x
- Carrasco P, Muñoz G. Tightening convex relaxations of trained neural networks: a unified approach for convex and S-shaped activations. 2024 Oct 30 [cited 2025 Apr 7]; Available from: https://arxiv.org/abs/2410.23362v1
- Webinar: Using Trained Machine Learning Predictors in Gurobi - YouTube [Internet]. [cited 2025 Apr 7]. Available from: https://www.youtube.com/watch?v=jaux5Oo4qHU
- Liu Z, Wang Y, Vaidya S, Ruehle F, Halverson J, Soljači´c MS, et al. KAN: Kolmogorov-Arnold Networks. 2024 Apr 30 [cited 2025 Apr 7]; Available from: https://arxiv.org/abs/2404.19756v5
- Arnold VI. On the representation of functions of several variables as a superposition of functions of a smaller number of variables. Collected Works [Internet]. 2009 [cited 2025 Apr 7];25–46. Available from: https://link.springer.com/chapter/10.1007/978-3-642-01742-1_5
- Yang D, Balaprakash P, Leyffer S. Modeling Design and Control Problems Involving Neural Network Surrogates. Comput Optim Appl [Internet]. 2021 Nov 20 [cited 2025 Apr 7];83(3):759–800. Available from: https://arxiv.org/abs/2111.10489v1
- Zhang Y, Sahinidis N V. Solving continuous and discrete nonlinear programs with BARON: Solving continuous and discrete nonlinear programs with BARON: Y. Zhang, N. V. Sahinidis. Comput Optim Appl [Internet]. 2024 Dec 5 [cited 2025 Apr 7];1–39. Available from: https://link.springer.com/article/10.1007/s10589-024-00633-0
- Bolusani S, Besançon M, Bestuzheva K, Chmiela A, Dionísio J, Donkiewicz T, et al. The SCIP Optimization Suite 9.0. 2024 Feb 27 [cited 2025 Apr 7]; Available from: https://arxiv.org/abs/2402.17702v2
- Gurobi Documentation [Internet]. [cited 2025 Apr 7]. Available from: https://docs.gurobi.com/current/