2025 AIChE Annual Meeting

(392ap) Strategies for Solving Mixed Integer Linear Optimization Problems with Embedded Relu Artificial Neural Networks

Authors

Christoph Plate, Max Planck Institute for Dynamics of Complex Technical Systems
Mirko Hahn, Otto von Guericke University
Caroline Ganzer, Imperial College London
Kai Sundmacher, Max Planck Institute for Dynamics of Complex Technical Systems
Sebastian Sager, Otto von Guericke University
When artificial neural networks (ANNs) are formulated with the ReLU activation function, they can be integrated into mixed integer linear programming problems (MILP) using a big-M reformulation. This enables optimization over trained surrogate models using established MILP methods. However, these optimization tasks appear to become challenging for increasing number of neurons and layers, and for optimization problems with multiple ANNs embedded. One reason are the big-M coefficients arising in the reformulation which grow exponentially with the number of layers.

In this work, we examine four ways to speed up MILP optimization over trained ReLU ANNs:

  1. Optimization-based bound tightening (OBBT)
  2. Novel a posteriori ReLU scaling
  3. Clipped ReLU
  4. Regularization

While OBBT and ReLU scaling can be applied to a trained ANN, of course clipped ReLU or regularization are applied during the ANN training. We further explore dropout during training in contrast to regularization.

The standard way of deriving big-M coefficients for the embedded ANN is via interval arithmetic (IA), where the bounds on the inputs of the ANN are propagated forward through the network. The resulting bounds tend to be unnecessarily large, as the dependencies between the activations of the neurons are not taken into account. OBBT has been proposed (Grimstad and Andersson, 2019; Badilla et al., 2023) to reduce the big-M coefficients by solving an LP to determine tighter bounds prior to solving the MILP. Clipped ReLU is hypothesized to reduce big-M coefficients by limiting the weights of the ANN, though it comes at the expense of an additional binary variable per neuron in the reformulation. As regularization is effective in reducing weights, it too could aid the subsequent optimization. Further, we propose a novel a posteriori scaling of the ANN that reduces the big-M coefficients (Plate et al., 2025). This is accomplished by solving an LP that minimizes the absolute value of scaled weights and biases and yields optimal scaling factors for each layer of the ANN.

We apply these methods to three benchmark problems – peaks, Ackley’s, and Himmelblau’s function – and vary the number of layers, layer size, level of regularization, as well as the cutoff value for clipped ReLU. In addition to computational time, we evaluate the big-M coefficients, the number of stable neurons, and the number of linear regions associated with the ANN. We use the Python package OMLT (Ceccon et al., 2022) to facilitate the embedding in Pyomo (Bynum et al., 2021, Hart et al., 2011) and solve the MILPs via Gurobi (Gurobi Optimization, 2024).

Our results suggest that OBBT is effective for all trained networks, reducing big-M coefficients by half compared to IA. The ReLU scaling or clipped ReLU alone decrease big-M coefficients to a similar degree. The combination of ReLU scaling and OBBT yields even tighter bounds at around 0.16 of the original bounds. Regularization delivers the highest reduction of the big-M coefficients by up to two orders of magnitude. However, only OBBT and regularization appear successful in lowering computational time significantly, to around half, and to as low as 3% of the original time, respectively. This implies that reducing big-M coefficients alone does not suffice. Analysis of the number of linear regions and the number of stable neurons reveals the effectiveness of regularization in improving both, while OBBT does not incur changes in either. The addition of ReLU scaling achieves a further speed-up by 10% compared to OBBT. Importantly, regularization greatly raises the number of instances that can be solved at all within the specified time limit.

References

F. Badilla, M. Goycoolea, G. Muñoz, and T. Serra. Computational Tradeoffs of Optimization-Based Bound Tightening in ReLU Networks. 2023. https://arxiv.org/abs/2312.16699

M. L. Bynum, G. A. Hackebeil, W. E. Hart, C. D. Laird, B. L., Nicholson, J. D. Siirola, J.-P. Watson, and D. L. Woodruff, Pyomo – optimization modeling in Python. Springer eBook Collection, volume 67, 3rd edition. 2021.

F. Ceccon, J. Jalving, J. Haddad, A. Thebelt, C. Tsay, C. D. Laird, and R. Misener. OMLT: Optimization & Machine Learning Toolkit. Journal of Machine Learning Research, 23(349), 1–8. 2022.

B. Grimstad, and H. Andersson. ReLU Networks as Surrogate Models in Mixed-Integer Linear Programs. Computers & Chemical Engineering, 131:106580. 2019.

Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual. 2024. https://www.gurobi.com

W.E. Hart, J.P. Watson, and D.L. Woodruff. Pyomo: modeling and solving mathematical programs in Python. Mathematical Programming Computation, 3(3):219–260. 2011.

C. Plate, M. Hahn, A. Klimek, C. Ganzer, K. Sundmacher, and S. Sager. An analysis of optimization problems involving ReLU neural networks. Under review, 2025. https://arxiv.org/abs/2502.03016