2024 AIChE Annual Meeting

(732e) Synergistic Integration of Reinforcement Learning with Conventional Process Control

Checkout You must be logged in to view this content. Log in now.

Pricing

Individuals

List Price	225.00
AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free

Authors

Daniel Beahr - Presenter

Debangsu Bhattacharyya, West Virginia University

Reinforcement learning (RL) is a machine learning technique with high potential for application to autonomous systems. The flexibility and adaptability of RL algorithms make them an attractive alternative for control applications when compared to standard conventional control approaches. The model-free nature of RL eliminates the necessity for system identification. Furthermore, the more recent developments in the field of RL allow algorithms to capture complex continuous dynamic systems [1], [2], whereas previously, the simpler basis functions were often constrained by the dimensionality of systems. In addition, the adaptability of RL allows it to update the control policy over time, e.g., for capturing slowly changing dynamics which may necessitate a new system identification or updated tuning for conventional control methods. However, RL has its own drawbacks and shortcomings that have limited its implementation for control applications. Because of the black-box nature of RL, it is difficult to predict and analyze the behavior of RL algorithms. This is unacceptable for many performance-critical process control tasks. In addition, the significant exploration requirements for training RL make online implementation of the algorithm difficult. An RL with incomplete or insufficient training poses the risk of significant degradation of control performance. Because of these reasons, the maximization of the expected return while ensuring reasonable system performance without degradation has been a focus of published literature [3]. When implementation of a naïve RL agent to a system is unacceptable, it is often desired to couple it with another form of conventional process control (CPC) [4].

In this work, we propose a control structure by augmenting existing CPC methods with an RL agent implemented in parallel. Due to the generally slow learning rates and high exploration requirements of RL, it is desired to have the existing conventional process controller (e.g., PID, MPC, etc.) continue to compute its own generated control action to enhance the learning rate of the RL agent [5]. A weighted sum of the control actions of the RL and CPC is derived, and subsequently applied to the plant [6]; the resultant states and actions are then used to supplement the RL agent’s learning. The proposed algorithm avoids direct action of the naive RL agent that may not result in acceptable performance and may even be unsafe under worst case scenarios. Algorithms are developed for an adaptive weighting function based on a measure of instantaneous and historical performance. Performance of both the RL and CPC methods are assessed using a moving horizon that decays with time, valuing more recent actions as more relevant than older actions. In addition, short-term trends are derived to allow for rapid transitions. In this way, the RL agent can take over control as and when its control performance exceeds that of the CPC method. If the RL’s performance begins to deteriorate, the conventional control method would again assume full control before significant degradation of performance.

The algorithm is demonstrated on a dynamic process model of a solid oxide fuel cell (SOFC) plant for H₂ and power production [7]. The RL algorithm used is twin-delayed deep deterministic policy gradient (TD3) for temperature regulation at the outlet of the SOC stack. The TD3 approach offers a notable advantage in its capability to handle continuous action spaces and allows for direct one-to-one comparison with control actions from the conventional control method. For temperature regulation of the SOC system, the CPC is a series of PID controllers arranged in cascade loops. Because of the complex dynamics associated with SOC mode-switching operation between hydrogen and power production, the performance of the PID controllers can be poor whereas the actor-critic structure of the RL algorithm seeks to facilitate accurate capturing of the nonlinear dynamics. For this case study, the RL is proposed to augment, and eventually phase out, the cascaded PID loops. The episodic learning used for the RL-CPC arrangement is a series of hydrogen production set-point changes. Mode switching from maximum hydrogen production to maximum power production and back to maximum hydrogen production is considered as well. While learning is episodic in nature, the states are continuous across episodes, creating a consistent measure of performance improvement.

Specific contributions of this work include:

An algorithm is developed for the parallel implementation of RL alongside conventional process control, allowing for transition of control from CPC to RL based on current and past performance. By leveraging short-term and long-term projections of control performance, this algorithm facilitates effective switching without degrading control.
Online training and implementation of a direct RL algorithm for process control on systems of complex nonlinear continuous dynamics is demonstrated. It is shown that degradation of RL performance is limited throughout training to the level expected of in-place conventional control.
It is observed that the RL-CPC algorithm can learn from and surpass a sub-optimal policy demonstrated by the conventional form of control that is in-place, eventually arriving at a superior policy than that of the conventional method.
It is observed that the rate at which an RL-CPC arrives at an optimal policy surpasses traditional online RL methods with limited performance degradation.
It is demonstrated that in cases where the RL encounters an unknown operational condition leading to degraded control performance, the control system will revert back to the conventional control that is in place, thereby limiting potential error and/or poor performance.

[1] T. P. Lillicrap et al., “Continuous control with deep reinforcement learning,” 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., Sep. 2016, [Online]. Available: http://arxiv.org/abs/1509.02971

[2] S. Fujimoto, H. Van Hoof, and D. Meger, “Addressing Function Approximation Error in Actor-Critic Methods,” in 35th International Conference on Machine Learning, ICML 2018, Feb. 2018, pp. 2587–2601. [Online]. Available: http://arxiv.org/abs/1802.09477

[3] J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,” J. Mach. Learn. Res., vol. 16, pp. 1437–1480, 2015.

[4] O. Dogru et al., “Reinforcement Learning in Process Industries : Review and Perspective,” 2 IEEE/CAA J. Autom. Sin., vol. 11, no. 2, pp. 1–19, 2024, doi: 10.1109/JAS.2024.124227.

[5] J. A. Clouse, “On Integrating Apprentice Learning and Reinforcement Learning,” University of Massachusetts, 1996.

[6] M. T. Rosenstein and A. G. Barto, “Reinforcement learning with supervision by a stable controller,” Proc. Am. Control Conf., vol. 5, pp. 4517–4522, 2004, doi: 10.1109/ACC.2004.182663.

[7] D. A. Allan et al., “NMPC for Setpoint Tracking Operation of a Solid Oxide Electrolysis Cell System,” Found. Comput. Aided Process Oper. / Chem. Process Control (FOCAPO/CPC 2023), pp. 1–6, 2023, [Online]. Available: https://www.netl.doe.gov/projects/files/NMPCforSetpointTrackingOperatio…

Breadcrumb

2024 AIChE Annual Meeting

(732e) Synergistic Integration of Reinforcement Learning with Conventional Process Control

Authors