2021 Annual Meeting
(176a) Development of Algorithms for Reinforcement Learning Augmented Model Predictive Control
Authors
In this work, two novel RL-based MPCs are presented. The first MPC directly combines the RL action-value function with MPC by using it as the MPC objective, thus maximizing the expected reward across the prediction horizon of the controller. This approach is attractive because it allows for the combination of the adaptability of RL with the explicit constraint handling of MPC but does require traditional optimization methods to be used online. In this controller, the stateâactionârewardâstateâaction with eligibility traces - SARSA(λ) - RL algorithm is used to update the action-value function based on temporal difference. To ensure exploration, the proposed policy is to use ε-MPC, where the control move provided by MPC is taken with probability ε, otherwise a random control move is selected. To ensure stability under exploration, the explored control moves are selected from a stable set of action trajectories constructed a priori.
The second controller focuses on the application of actor-critic RL inspired by an MPC. While the actor-critic structure does not need a predetermined policy, MPC can be leveraged to improve the performance of the learning. First, the agentâs value function and the parameterized policy are treated as two (optionally deep) recurrent neural networks. Clearly, random initialization of the policy would not be acceptable for application to process systems. However, similar to explicit MPC, the optimal control move of the MPC can be computed offline and used to initialize the policy via supervised learning. This algorithm also facilitates using a model similar to MPC to perform policy rollouts over a given horizon to improve the rate of convergence of the policy and state-value approximators.
These RL-augmented MPC algorithms are applied to a classic nonlinear chemical reactor as well as for the challenging control of load and main steam temperature and pressure for a supercritical pulverized coal power plant. Application is shown for both episodic and continuing cases, showing the flexibility of the algorithms under simple modifications. The results show that compared to traditional linear and nonlinear MPC, the RL-MPC algorithms improve control performance, especially when the system faces similar control tasks. The study also shows where improvement in computational time would be desired for real-life application of these algorithms.
[1] P. Slade, Z. N. Sunberg, and M. J. Kochenderfer, âEstimation and Control Using Sampling-Based Bayesian Reinforcement Learning,â Jul. 2018, [Online]. Available: http://arxiv.org/abs/1808.00888.
[2] Y. Kim and J. M. Lee, âModel-based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees,â AIChE J., vol. 66, no. 10, Oct. 2020, doi: 10.1002/aic.16544.
[3] J. Shin, T. A. Badgwell, K. H. Liu, and J. H. Lee, âReinforcement Learning â Overview of recent progress and implications for process control,â Comput. Chem. Eng., vol. 127, pp. 282â294, Aug. 2019, doi: 10.1016/j.compchemeng.2019.05.029.
[4] D. Görges, âRelations between Model Predictive Control and Reinforcement Learning,â IFAC-PapersOnLine, vol. 50, no. 1, pp. 4920â4928, Jul. 2017, doi: 10.1016/j.ifacol.2017.08.747.
[5] J. E. Morinelly and B. E. Ydstie, âDual MPC with Reinforcement Learning,â 2016.
[6] M. Zanon, S. Gros, and A. Bemporad, âPractical reinforcement learning of stabilizing economic MPC,â in 2019 18th European Control Conference, ECC 2019, Jun. 2019, pp. 2258â2263, doi: 10.23919/ECC.2019.8795816.
[7] S. Gros and M. Zanon, âData-driven economic NMPC using reinforcement learning,â IEEE Trans. Automat. Contr., vol. 65, no. 2, pp. 636â648, Feb. 2020, doi: 10.1109/TAC.2019.2913768.