Machine learning (ML) models have emerged as powerful surrogate models in optimization, enabling the integration of data-driven function approximations directly into mathematical programming formulations. This approach is particularly valuable in cases where the underlying functional relationships are either fully or partially unknown, potentially highly nonlinear, computationally expensive to evaluate or, worst, a combinations of all of these factors. Regardless of how they are used, these surrogate models are subject to error (due to noise, insufficient data, and fitting procedures). Recent advances in statistical machine learning allow us to characterize errors with statistically valid confidence intervals. We present a general framework that combines these techniques with optimization methods to address statistically valid model uncertainty in mathematical programs with embedded machine learning models.
The use of ML models within optimization can be categorized into three primary motivations. First, embedding ML surrogates allows for solving inverse problems, wherein optimal inputs can be identified to achieve desired outputs constrained by the learned function. Second, ML surrogates can serve as replacements or approximations for first-principles models, particularly in scenarios where high-fidelity simulations are infeasible or costly to evaluate from a computational standpoint. Third, ML models may facilitate solver compatibility and integration by converting black-box external computations into equation-oriented formulations, thereby enabling the use of deterministic global optimization solvers [1].
Among the most common ML models embedded in optimization are neural networks (NNs) and decision trees. NNs with rectified linear unit (ReLU) activations can be represented as mixed-integer linear programs (MILPs) through big-M formulations [2], with subsequent research improving their computational tractability via bound tightening [3] and ideal MILP formulations [4]. Decision trees and their ensembles, such as random forests and gradient-boosted decision trees, can also be expressed as MILPs by encoding decision paths as constraints [1]. These representations enable piecewise-linearization of nonlinear functions, making them particularly useful in mixed-integer optimization. Applications span diverse domains, including process optimization in chemical engineering, power systems security analysis, and medical decision-making for chemotherapy regimen selection [5]. The growing availability of open-source toolkits such as OMLT [6] has further streamlined the embedding of ML models in optimization, increasing their practical applicability in large-scale, data-driven decision-making problems.
While surrogate models provide an alternative to embed complex functions within optimization problems, their inherent approximation errors introduce significant challenges. Since these models are trained on finite datasets, they may fail to capture the full complexity of the underlying truth model, leading to discrepancies between the surrogate and actual system behavior. This issue is particularly problematic in optimization, where solvers exploit the mathematical structure of the problem to find extreme points. Small errors in the surrogate model can be magnified, leading to unrealistic or infeasible solutions that do not correspond to the true system optimum [7]. Overfitting further aggravates this issue, as a surrogate model that fits training data too closely may fail to generalize to unseen conditions, yielding solutions that are highly sensitive to small changes in input variables. In process optimization and engineering applications, ensuring that surrogate-based solutions remain valid under real-world conditions is crucial, as failing to do so can lead to inefficient, unsafe, or economically unviable operational decisions [8].
Recent advances in statistical machine learning have introduced methods to quantify model uncertainty and characterize errors through statistically valid confidence intervals with a desired coverage. In this work, we explore how these techniques can be integrated into machine learning surrogate optimization to account for model errors and improve solution robustness. We propose a general framework for incorporating uncertainty from model errors into optimization, applicable to both regression and classification surrogate models, using any MIP-representable model. Using case studies from machine learning and chemical engineering, we demonstrate that our approach yields solutions that are more robust and less sensitive to variations in input parameters.
References
[1] Ammari, B.L., Johnson, E.S., Stinchfield, G., Kim, T., Bynum, M., Hart, W.E., Pulsipher, J. and Laird, C.D., 2023. Linear model decision trees as surrogates in optimization of engineering applications. Computers & Chemical Engineering, 178, p.108347.
[2] Fischetti, M. and Jo, J., 2018. Deep neural networks and mixed integer linear optimization. Constraints, 23(3), pp.296-309.
[3] Grimstad, B. and Andersson, H., 2019. ReLU networks as surrogate models in mixed-integer linear programs. Computers & Chemical Engineering, 131, p.106580.
[4] Grimstad, B. and Andersson, H., 2019. ReLU networks as surrogate models in mixed-integer linear programs. Computers & Chemical Engineering, 131, p.106580.
[5] Maragno, D., Wiberg, H., Bertsimas, D., Birbil, Ş.İ., den Hertog, D. and Fajemisin, A.O., 2023. Mixed-integer optimization with constraint learning. Operations Research.
[6] Ceccon, F., Jalving, J., Haddad, J., Thebelt, A., Tsay, C., Laird, C.D. and Misener, R., 2022. OMLT: Optimization & machine learning toolkit. Journal of Machine Learning Research, 23(349), pp.1-8.
[7] Biegler, L.T., 2024. The trust region filter strategy: Survey of a rigorous approach for optimization with surrogate models. Digital Chemical Engineering, p.100197.
[8] Misener, R. and Biegler, L., 2023. Formulating data-driven surrogate models for process optimization. Computers & Chemical Engineering, 179, p.108411.