2025 AIChE Annual Meeting

(259b) Control-Informed Reinforcement Learning

Authors

Calvin Tsay, Imperial College London
Antonio del Rio Chanona, Imperial College London
This work proposes a control-informed reinforcement learning (CIRL) framework that integrates proportional-integral-derivative (PID) control components into deep reinforcement learning (RL) policies for chemical process control [4]. The integration of established control theory with data-driven RL addresses challenges in industrial process control: effectively managing nonlinear dynamics while maintaining operational stability and reducing sample complexity.

Process industries traditionally rely on PID controllers for their reliability, interpretability, and well-established tuning methods [1]. Despite decades of development in PID control technology, significant challenges persist. The manual effort required for controller tuning remains substantial, particularly when process conditions change. Furthermore, PID controllers often struggle to provide adequate performance for highly nonlinear and time-varying systems without extensive retuning or gain scheduling [2]. While deep RL shows promise for complex control tasks by learning optimal policies directly from interactions with the environment, it typically requires prohibitively large numbers of samples and fails to leverage existing control knowledge [3], limiting practical industrial adoption. Previous approaches have explored using RL to tune PID gains or implementing gain scheduling strategies, but these methods often treat gain tuning as a separate optimization problem rather than an integrated learning task.

Our CIRL framework embeds PID control structures directly into deep RL policy architectures, enabling neural networks to learn adaptive PID gain scheduling while preserving inherent feedback control stability. Unlike methods that treat gain tuning as a separate optimization problem, our approach enables continuous adaptation of PID gains through an integrated neural network, allowing the controller to respond to changing operating conditions in real time. The CIRL agent consists of a deep neural network that takes observed states as inputs and outputs the PID gain parameters, followed by a PID controller layer that computes control actions based on the error signal and the learned gain parameters. For process control regulatory problems, we adopt a reward function that balances tracking performance and control effort, similar to MPC objective functions. The implementation uses a velocity-form PID controller layer to ensure smooth transitions when gains change, and is trained via evolutionary optimization strategies that do not require differentiation through the control structure. We evaluate CIRL on a multivariable continuously stirred tank reactor (CSTR) system with nonlinear reaction kinetics involving two controlled variables: concentration and temperature.

Comprehensive simulation studies demonstrate that CIRL significantly outperforms both conventional model-free deep RL and PID controllers. Comparison of the learning curves, shows that CIRL achieves higher performance with significantly fewer training samples compared to pure RL approaches, exhibiting faster convergence and lower variance during training. When tested on setpoint tracking tasks, CIRL shows improved performance in tracking setpoint trajectories and exhibits robustness when encountering previously unseen operating points outside the training distribution. Our experiments include challenging scenarios where the system operates near its upper operational limits, where gradient changes in the process dynamics make control particularly difficult. In these regions, CIRL adaptively decreases the proportional gain to maintain stability, demonstrating its ability to learn appropriate gain scheduling for different operating regimes. Additionally, the embedded control structure enhances robustness against measured disturbances, with test metrics showing improved disturbance rejection compared to pure RL approaches. This improved robustness stems from CIRL’s integrated PID control structure, which continually measures and responds to errors between the set point and the actual system output, allowing it to adapt to unexpected disturbances in real time.

The CIRL framework represents a significant advancement in bridging classical control engineering with modern machine learning approaches. By combining the interpretability and reliability of PID control with the adaptability and nonlinear modeling capacity of deep RL, CIRL offers a sample-efficient, robust approach for controlling complex industrial systems. This work establishes a practical pathway for implementing advanced control techniques in process industries while maintaining operational stability and interpretability. The enhanced sample efficiency and robustness to unmodeled disturbances make CIRL particularly suitable for real-world deployment in chemical and biochemical processes. The ability to adaptively tune PID gains in response to changing operating conditions eliminates the need for manual retuning or complex gain scheduling schemes, potentially reducing operational costs and improving control performance across varying process conditions.

[1] Dale E Seborg et al. Process dynamics and control. John Wiley & Sons, 2016.

[2]Shuichi Yahagi and Itsuro Kajiwara. “Noniterative Data-Driven Gain-Scheduled Controller De-
sign Based on Fictitious Reference Signal”. In: IEEE Access 11 (2023), pp. 55883–55894. doi:
10.1109/ACCESS.2023.3278798.
[3] Rui Nian, Jinfeng Liu, and Biao Huang. “A review on reinforcement learning: Introduction and
applications in industrial process control”. In: Computers & Chemical Engineering 139 (2020),
p. 106886.
[4] Maximilian Bloor et al. “Control-Informed Reinforcement Learning for Chemical Processes”.
In: Industrial & Engineering Chemistry Research 64.9 (2025), pp. 4966–4978. doi: 10.1021/
acs.iecr.4c03233. eprint: https://doi.org/10.1021/acs.iecr.4c03233. url: https:
//doi.org/10.1021/acs.iecr.4c03233