2025 AIChE Annual Meeting
(243e) Safe Reinforcement Learning Control Via Linear MPC Pre-Training for Industrial Chemical Processes
Authors
The MPC algorithm requires an accurate process model to select the appropriate control action. There are two approaches to creating an accurate model: (1) first-principles modeling and (2) data-driven modeling. Creating and maintaining first-principles models is very resource-intensive. On the other hand, building a nonlinear data-driven model requires informative data and constant re-tuning, which is both challenging and costly. Additionally, implementing nonlinear MPC (NMPC) introduces significant computational complexity because it involves solving a nonconvex optimization problem that is very difficult to solve optimally. Therefore, a practical approach is linear MPC, which is more common, employs a linear state-space model, and achieves acceptable closed-loop performance. However, this linearity may fail to control the system effectively when facing disturbances or in complex, nonlinear systems [2].
Model-free reinforcement learning (RL) has emerged as a promising alternative to NMPC. To train and control the system, an RL agent requires random exploration to find the optimal policy. However, such exploration is not feasible in a chemical engineering plant, where maintaining high-quality products and ensuring safety are the top priorities. An alternative approach is to pre-train the agent offline, enabling restricted exploration that does not compromise plant goals. Nonetheless, the challenges associated with MPC persist, as pre-training still demands an accurate system model. Since linear MPC is commonly available, why not start there? Can this limited data be used to pre-train an RL agent before applying it to the real plant to avoid random exploration?
Our group previously leveraged an existing linear MPC to design an RL agent controller [3]. In the proposed method, a linear MPC optimization formula is used to pre-train the agent. The RL agent, which has imitated the performance of a linear MPC, will be applied to the real plant and will then safely learn the system's non-linearity and dynamics using restricted random exploration to avoid safety hazards. Building upon this, the current study aims to demonstrate the applicability of this RL approach for large-scale industrial chemical processes. Additionally, to broaden the algorithm's applicability, we incorporate output and system safety constraints. The efficacy of the safe RL method is evaluated through two case studies: (1) control of a polymerization reactor [4], and (2) distillation column control using an ethylene splitter (C2 splitter) in ASPEN Dynamics. An industrial step-response-based MPC is used for pre-training, and real-time simulations demonstrate the effectiveness of the RL-based control strategy.
References
[1] Wahid, A. and Prasetyo, A.P., 2018, March. A Comparative study between MPC and PI controller to control vacuum distillation unit for producing LVGO, MVGO, and HVGO. In IOP Conference Series: Materials Science and Engineering (Vol. 334, No. 1, p. 012020). IOP Publishing.
[2] Khather, S.I., Ibrahim, M.A. and Abdullah, A.I., 2023. Review and Performance Analysis of Nonlinear Model Predictive Control--Current Prospects, Challenges and Future Directions. Journal Européen des Systèmes Automatisés, 56(4).
[3] Hassanpour, H., Wang, X., Corbett, B. and Mhaskar, P., 2024. A practically implementable reinforcement learning‐based process controller design. AIChE Journal, 70(1), p.e18245.
[4] Bustos, G.A., Ferramosca, A., Godoy, J.L. and González, A.H., 2016. Application of model predictive control suitable for closed-loop re-identification to a polymerization reactor. Journal of Process Control, 44, pp.1-13.