2024 AIChE Annual Meeting

(711i) An Implementable Reinforcement Learning Approach for Online Adaptation of Model Predictive Control

Authors

Hassanpour, H. - Presenter, McMaster University
Kumar, A., Linde
Mhaskar, P., McMaster University
Model predictive control (MPC) is an advanced model-based control strategy that can handle multi-variable interactions and physical system constraints [1]. Successful implementations of this technique depend significantly on the model used to predict the process behavior. Due to the existing challenges in developing and maintaining first-principles models and the problems with the availability of high-quality and rich data (for detailed data-driven modeling), simple step-response models are mostly utilized, in practice, to design and implement MPC algorithms. Offset-free MPC is also developed to handle plant-model mismatch [2]. The implementation of such linear MPCs in the presence of changes in plant operating conditions can lead to performance degradation (because of increased plant-model mismatch when the model is used to predict in dynamic regions where the step tests are not performed). This requires the use of efficient re-identification strategies to maintain the closed-loop performance. However, the re-identification methods are usually expensive to implement as they disrupt production for several days/weeks.

Reinforcement learning (RL) has shown promising potential in several process control applications [3]. An RL agent is employed to interact with the process to find an optimal policy that maximizes a predefined reward function. However, standard model-free RL techniques are not implementable in practical situations due to random explorations that must be performed by the agent to find the optimal policy (safety and economic risks). Thus, several RL-based controllers are developed by pre-training an RL agent using a surrogate/detailed model of the process [4]. In situations where detailed models are not available, a pre-training strategy is proposed by leveraging existing MPCs developed using simple step-response models [5]. The pre-trained agent is then employed for online control to improve the closed-loop performance compared to the nominal MPC.

The RL approaches, discussed above, are developed to design an implementable RL controller. However, in most practical situations, it is preferable to use/retain already established control methods such as MPC. As mentioned earlier, the acceptable performance of such controllers, in the presence of unmeasured disturbances, heavily relies on several factors such as tuning parameters of the model (model parameters) and MPC optimization problem (prediction and control horizons and penalty weights). The MPC tuning parameters are usually adjusted by a trial-and-error process. This can lead to suboptimal control performance. RL has been utilized by several researchers to develop a systematic way to tune the prediction and control horizons [6] and estimate the parameters of a class of nonlinear systems [7]. While these excellent approaches are proposed, their implementations depend on the availability of a good first-principles model. As discussed, most industrial MPCs are implemented using simple step-response linear models. Developing a practically implementable approach for online model tuning in such MPCs, in the presence of disturbances, remains a valuable task.

Motivated by the above considerations, an RL-based approach is proposed for the online tuning of a representative offset-free MPC. The RL agent is employed to fine-tune the nominal model parameters (identified using a step test) under changes in plant operating conditions. A twin-delayed deep deterministic policy gradient (TD3) algorithm (a class of actor-critic methods) is employed to tune the model parameters by exploring small regions of parameter spaces. The effectiveness of the proposed approach is demonstrated using two illustrative examples (a single input single output (SISO) system and a multiple input multiple output (MIMO) system). The results reveal the superior performance of the proposed RL technique to fine-tune the model parameters to obtain better closed-loop performance, compared to the nominal MPC and MPC with re-identification.

References

[1] Mayne, D.Q., Rawlings, J.B., Rao, C.V. and Scokaert, P.O., 2000. Constrained model predictive control: Stability and optimality. Automatica, 36(6), pp.789-814.

[2] Pannocchia, G. and Rawlings, J.B., 2003. Disturbance models for offset‐free model‐predictive control. AIChE journal, 49(2), pp.426-437.

[3] Nian, R., Liu, J. and Huang, B., 2020. A review on reinforcement learning: Introduction and applications in industrial process control. Computers & Chemical Engineering, 139, p.106886.

[4] Solinas, F.M., Macii, A., Patti, E. and Bottaccioli, L., 2024. An online reinforcement learning approach for HVAC control. Expert Systems with Applications, 238, p.121749.

[5] Hassanpour, H., Wang, X., Corbett, B. and Mhaskar, P., 2024. A practically implementable reinforcement learning‐based process controller design. AIChE Journal, 70(1), p.e18245.

[6] Hedrick, E., Hedrick, K., Bhattacharyya, D., Zitney, S.E. and Omell, B., 2022. Reinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit. Computers & Chemical Engineering, 160, p.107727.

[7] Alhazmi, K., Albalawi, F. and Sarathy, S.M., 2022. A reinforcement learning-based economic model predictive control framework for autonomous operation of chemical reactors. Chemical Engineering Journal, 428, p.130993.