2025 AIChE Annual Meeting

(60g) Hybrid Deep Reinforcement Learning Agent for Online Scheduling and Control for Chemical Batch Plants

Authors

Daniel Rangel Martinez - Presenter, University of Waterloo
Luis Ricardez-Sandoval, University of Waterloo
The integration of decision variables from scheduling and control tasks are of interest since they may offer an opportunity to increase profits and enhance sustainability in chemical processing plants. Moreover, the growing number of sensory data, which can be collected and processed, is a major area of opportunity to approach this integration in real time. Nevertheless, a challenge on using this information for making decisions is the asynchronous way in which it is generated, i.e., the generation of information happens at a different time scale for different processes. Thus, the need of reactive methods that can handle and use the generated data from the process to avoid infeasible decisions for scheduling and control has been of interest to the field of Process System Integration [1], [2]. The usual approach for the integration of the scheduling and control tasks consists in the decomposition of the problem which reduces its complexity [3]. Then, the control problem can be handled as a Dynamic Optimization problem, while the scheduling problem can be approached as a Mixed Integer Linear Programming (MILP) problem. A disadvantage from this method is the asynchronous approach of the problem: while one task is solved, the variables from the other task are fixed, which might reduce the quality of the solution and increase the computational burden [4]. The use of data-driven methods are an alternative to find and implement optimal decisions from scheduling and control tasks in real-time simultaneously.

In this work the integration of scheduling and control tasks is done through the development of a hybrid agent that operates both tasks in an online fashion. This agent is generated with a Deep Reinforcement Learning (DRL) method by training its decision-making process through the simulation of the plant. The hybrid agent holds a policy modelled with an Artificial Neural Network (ANN) which inputs the conditions from the scheduling and control tasks in the plant. These conditions are processed in the policy and mapped to scheduling and control decisions simultaneously while following a user-defined objective (reward) function. Examples of these decisions include the activation or deactivation of a process (for the scheduling task), and the flow-rate, processing-time, and temperature control (for the control task). One of the challenges of this approach is the asynchronous decision-making process that takes place in the process integration, i.e., the scheduling and control decisions are not required at every time-interval. To handle this situation, a masking technique is integrated into the environment. The masking method restricts the action space to feasible decisions that can be implemented in the environment at a given time. A contribution in this work is that the integration of the scheduling and control problem is approached as a Partially Observable Markov Decision Process (POMDP). This approach allows to augment the information provided to the agent and, consequently, have a perception of the evolution of the process. This becomes useful for tracking the development and effects of unexpected events in the process and allows to react accordingly. The agent uses a set of Recurrent Neural Networks (RNN) to correlate (i.e., gain insights) a sequence of states from different time-intervals in the plant instance, i.e., the recent history of the process is considered for making decisions. The DRL method presented in this work provides different exploration techniques for both discrete and continuous decisions, which consists in dynamic hyperparameters that change during the training of the agent. After the training of the policy, the agent can be implemented online in the process and provide decisions in a fraction of a second. The present method provides different attractive features for the task integrations which include a) To consider the information from both tasks when making a decision, b) A fast response from the agent in implementation, allowing an online scheduling and control of the process, c) The capacity to react to multiple unexpected events in the environment, e.g., disturbances or parametric uncertainty. The Proximal Policy Optimization (PPO) method, which is considered a state-of-the-art DRL algorithm is used to train the agent. The framework for training the DRL agent was built with the PyTorch package, version 2.1.0 and the environment was built with the Gym toolkit version 0.26.2.

The proposed framework was tested on a State Task Network adapted from the literature [4]. The process has four tasks (i.e., first reaction, filtering, second reaction, separation) that the agent can activate. The first and second reactions have two and one manipulated variables, respectively. The scheduling decision chooses one of the four processes to activate while the control decisions define the actions in the manipulated variables, e.g., cold feed flowrate in a reactor. The concentration of two inlet feed streams to the first and second reactions represent disturbances that are assumed to follow a random uniform distribution. The agent needs to specify scheduling and control decisions considering the concentration of the streams and the current state of the plant, i.e., machine occupation and state of the processes taking place. Results showed that the agent could adjust the manipulated variables according to the concentration of the reactants. The scheduling task was able to accommodate multiple processes in the horizon to maximize the number of batches completed in the plant. The approach shows potential for building online policy models that can simultaneously consider scheduling and control decisions in the presence of external perturbations.

References

[1] J. Zhuge and M. G. Ierapetritou, “An integrated framework for scheduling and control using fast model predictive control,” AIChE Journal, vol. 61, no. 10, pp. 3304–3319, 2015, doi: 10.1002/aic.14914.

[2] D. Rangel-Martinez and L. A. Ricardez-Sandoval, “Data-driven techniques for optimal and sustainable process integration of chemical and manufacturing systems,” in Optimization in Chemical Engineering: Deterministic, Meta-Heuristic and Data-Driven Techniques, Walter de Gruyter GmbH & Co KG, 2025, pp. 215–255.

[3] Y. I. Valdez-Navarro and L. A. Ricardez-Sandoval, “A Novel Back-off Algorithm for Integration of Scheduling and Control of Batch Processes under Uncertainty,” Ind. Eng. Chem. Res., vol. 58, no. 48, pp. 22064–22083, Dec. 2019, doi: 10.1021/acs.iecr.9b04963.

[4] H. U. Rodríguez Vera and L. A. Ricardez-Sandoval, “Integration of Scheduling and Control for Chemical Batch Plants under Stochastic Uncertainty: A Back-Off Approach,” Ind. Eng. Chem. Res., vol. 61, no. 12, pp. 4363–4378, Mar. 2022, doi: 10.1021/acs.iecr.1c04386.