2023 AIChE Annual Meeting
(471a) Real-Time Chemical Production Rescheduling Via Explorative Reinforcement Learning Considering Nervousness
Authors
Therefore, reinforcement learning (RL)-based scheduling methodology has been proposed to shift the rescheduling problems from the optimization problem to the generative AI model learning problem to respond to changes in the environment instant in a short time [4, 5]. RL is a machine learning algorithm that learns to maximize a cumulative reward through continuous interaction with its environment. It does not need a separate data-collecting stage and can work in dynamic and uncertain environments with its stochastic action policy. Moreover, well-trained RL policy networks can handle real-time disturbances by taking optimal actions on the basis of updated input states. This ability of RL to make optimal decisions quickly and flexibly in an uncertain environment makes it useful for real-time optimization in dynamic scheduling systems [6].
Herein, we propose a RL model for a single-stage scheduling problem that can instantly respond to unexpected changes in the environments. We applied action masking to filter infeasible actions in advance, so that only feasible actions can be considered in the action space. In addition, we have applied an intrinsic curiosity module to encourage the agent to explore new parts of the long-term environment [7]. The backstepping method in the inference phase is developed to obtain more explorative schedules. To validate the performance of the RL model, we used the scheduling problem data given in the paper of Harjunkoski et al. [8] for training and evaluation. In static scheduling environment, our RL model achieved over 95% of the cost objective within short execution time, indicating its comparable performance to conventional scheduling methods. Furthermore, several case studies have confirmed that our RL scheduling model can generate alternative schedules under various disruptions in real-time. The rescheduling results are presented simultaneously with the cost vs nervousness Pareto curve, allowing the decision maker to choose the optimal point for the desired objective.
[1] Lee, H. and C.T. Maravelias, Combining the advantages of discrete- and continuous-time scheduling models: Part 1. Framework and mathematical formulations. Computers & Chemical Engineering, 2018. 116: p. 176-190.
[2] Ave, G.D., et al., An Explicit Online Resource-Task Network Scheduling Formulation to Avoid Scheduling Nervousness, in Computer Aided Chemical Engineering, A.A. Kiss, et al., Editors. 2019, Elsevier. p. 61-66.
[3] Atadeniz, S.N. and S.V. Sridharan, Effectiveness of nervousness reduction policies when capacity is constrained. International Journal of Production Research, 2020. 58(13): p. 4121-4137.
[4] Luo, S., Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Applied Soft Computing, 2020. 91: p. 106208.
[5] Zhou, T., et al., Multi-agent reinforcement learning for online scheduling in smart factories. Robotics and Computer-Integrated Manufacturing, 2021. 72: p. 102202.
[6] Hubbs, C.D., et al., A deep reinforcement learning approach for chemical production scheduling. Computers & Chemical Engineering, 2020. 141: p. 106982.
[7] Pathak, D., et al., Curiosity-driven Exploration by Self-supervised Prediction, in Proceedings of the 34th International Conference on Machine Learning, P. Doina and T. Yee Whye, Editors. 2017, PMLR: Proceedings of Machine Learning Research. p. 2778--2787.
[8] Harjunkoski, I. and I.E. Grossmann, Decomposition techniques for multistage scheduling problems using mixed-integer and constraint programming methods. Computers & Chemical Engineering, 2002. 26(11): p. 1533-1552.