Real-world optimization problems often involve parameters that are inherently uncertain. In a contextually uncertain environment, these parameters are typically predicted from various feature factors. The prediction-focused learning framework aims solely to minimize prediction error without considering how predictions are used in downstream optimization. But prediction accuracy alone is insufficient, because usually it is the quality of the resulting optimal decisions that truly matters. As a comparison, decision-focused learning (also known as the Smart “Predict, then Optimize” framework[1]) trains predictive models to minimize the decision error induced by parameter predictions.
Traditional decision-focused methods employ empirical regret as the loss function, which quantifies the loss incurred by deviating from the optimal decision in a given empirical scenario. However, research has shown that this loss becomes highly sensitive to data noise and outliers.[2] This sensitivity causes the model to overfit the empirically optimal decision derived from the training sample, thereby impairing its ability to generalize effectively to unseen data.
In our work, we propose an improved approach that extends decision-focused learning framework with the consideration of distributional uncertainty. The approach utilizes mixture density networks to model the conditional Gaussian mixture distribution of the uncertain parameters and define the ambiguity set. By solving a Wasserstein DRO problem over the ambiguity set, we obtain the optimal decision for the worst-case parameter distribution. This decision is then incorporated into the training of the prediction model, replacing the reliance on the empirical optimal decision. We compare our proposed method against traditional decision-focused learning, prediction-focused learning, and prediction-focused learning combined with contextual robust optimization (RO)[3]. Experimental results demonstrate that by accounting for uncertainty distributions and optimizing for the worst-case distribution, our method enhances decision stability, reduces the risk of overfitting, and improves overall model generalization.
[1] Elmachtoub, Adam N., and Paul Grigas. "Smart “predict, then optimize”." Management Science 68.1 (2022): 9-26.
[2] Schutte, N. J., K. S. Postek, and N. Yorke-Smith. "Robust Losses for Decision-Focused Learning." 33rd International Joint Conference on Artificial Intelligence. pp. 4868-4875. 2024.
[3] Li, Xianyu, et al. "Data-driven contextual robust optimization based on support vector clustering." Computers & Chemical Engineering (2025): 109004.