2020 Virtual AIChE Annual Meeting
(108b) Fast-Convergence of Deep Reinforcement Learning Controller: Application to a Continuous Stirred Tank Reactor
Despite its success, DRL controller has few limitations including the requirement of a large amount of data and high computational loads, and careful selection and initialization of hyperparameters for fast convergence, etc. Additionally, one glaring limitation of DRL controller, as with many other RL methods, is the long training time before it can deliver satisfactory control performance [5]. In order to overcome this challenge, we propose to train the actor and the critic offline using historical process data before using it for online control. For the actor network which approximates the policy function, we use the information of past states and control actions to train it offline until convergence within the training region is achieved. To train the critic network offline, which approximates the action-value function, we use the information of reward gain, calculated based on a pre-defined reward function, to train it offline until convergence is achieved. Once trained offline, we use the learned actor-critic as the starting point in the DRL controller. This pre-trained DRL controller is implemented to track concentration and temperature set-points for a continuous stirred tank reactor (CSTR) process, and we successfully demonstrate its ability to adapt and learn to track set-points outside the training region faster compared to a DRL controller that was randomly initialized. We also compare the control performance of this pre-trained DRL controller against a model-predictive controller in tracking a set-point.
Literature cited:
[1] Sutton, R.S., Barto, A.G. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 1998.
[2] Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M. Playing Atari with Deep Reinforcement Learning. arXiv Preprint, arXiv:13125602, 2013.
[3] Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M. Deterministic policy gradient algorithms. In ICML, 2014.
[4] Spielberg, S., Tulsyan, A., Lawrence, N.P., Loewen, P.D., Bhushan Gopaluni, R. Toward selfâdriving processes: A deep reinforcement learning approach to control. AIChE J, 65(10), 2019.
[5] Shin, J., Badgwell T.A., Liu, KH., Lee, J.H. Reinforcement Learning â Overview of recent progress and implications for process control. Computer Aided Chemical Engineering, 44:71-85, 2018.