2023 AIChE Annual Meeting
(207c) Necessary Optimality-Constrained Bayesian Optimization (NOBO) for Efficiently Learning Complex Control Policies from Closed-Loop Data
Authors
First-order BO methods mainly focus on standard acquisition functions and indirectly incorporate derivative measurements into the probabilistic surrogate model to enhance local predictions [5]. Nevertheless, these methods can exhibit drawbacks, such as potentially significant increases in training and optimization costs due to greater model complexity, and may fail if gradient observations are heavily obscured by noise [6]. In this talk, we propose a computationally efficient approach to simultaneously utilize performance (zero-order) and derivative (first-order) data within a single acquisition optimization subproblem. Our core idea involves imposing a series of black-box constraints that mimic the necessary optimality conditions for the original global optimization problem at each iteration. The proposed necessary-optimality BO (NOBO) method [7] employs Gaussian process surrogates for the objective's partial derivatives to approximately enforce first-order optimality conditions as black-box constraints in the acquisition function. These constraints establish a feasible set that explicitly accounts for the uncertainty in estimating partial gradients from data, which is updated as new data is observed. Consequently, the feasible set allows for narrowing down the design space search to regions that are jointly informative concerning both zeroth- and first-order information.
We examine the theoretical performance and regret bounds associated with the proposed algorithm and demonstrate in practice that incorporating these constraints, which restrict the allowable search space, leads to faster convergence rates compared to conventional BO. We further validate these performance enhancements on a reinforcement learning (RL) benchmark problem based on the linear quadratic regulator (LQR) problem [8], where the reward function's derivatives can be estimated directly from closed-loop data using the policy gradient theorem.
References:
[1] Shahriari, Bobak, et al. "Taking the human out of the loop: A review of Bayesian optimization." Proceedings of the IEEE 104.1 (2015): 148-175.
[2] Paulson, Joel A., Georgios Makrygiorgos, and Ali Mesbah. "Adversarially robust Bayesian optimization for efficient autoâtuning of generic control structures under uncertainty." AIChE Journal 68.6 (2022): e17591.
[3] Frazier, Peter I. "A tutorial on Bayesian optimization." arXiv preprint arXiv:1807.02811 (2018).
[4] Shekhar, Shubhanshu, and Tara Javidi. "Significance of gradient information in bayesian optimization." International Conference on Artificial Intelligence and Statistics. PMLR, 2021.
[5] Wu, Jian, et al. "Bayesian optimization with gradients." Advances in neural information processing systems 30 (2017).
[6] Penubothula, Santosh, Chandramouli Kamanchi, and Shalabh Bhatnagar. "Novel first order bayesian optimization with an application to reinforcement learning." Applied Intelligence 51 (2021): 1565-1579.
[7] Makrygiorgos, Georgios, Joel A. Paulson, and Ali Mesbah. " No-Regret Bayesian Optimization with Gradients using Local Optimality-based Constraints: Application to Closed-loop Policy Search", 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023
[8] Recht, Benjamin. "A tour of reinforcement learning: The view from continuous control." Annual Review of Control, Robotics, and Autonomous Systems 2 (2019): 253-279.