2024 AIChE Annual Meeting
(732f) A Reinforcement Learning Framework for the Simultaneous Generation, Design and Control of Chemical Process Flowsheets
Authors
This study introduces a framework capable to simultaneously generate, design and control chemical process flowsheets using RL. A Proximal Policy Optimizer (PPO) agent is used to interact with the environment. The agent’s primary goal is to simultaneously generate, design and control a chemical process flowsheet that can optimize a user-defined objective function, which also considers the process dynamic variability in closed-loop, e.g., disturbance rejection tracking errors, while adhering to the process and equipment operation and design constraints. Objectives and constraints are enforced using a reward shaping strategy. The agent interacts with an environment composed of different neural networks (NNs), which are identified a priori and aim to approximate the dynamic models that resemble the closed-loop operation of the unit operations included the integrated problem. The nature of the PPO agent depends on the action space of the integrated problem, i.e., if discrete, continuous or hybrid decisions are considered. Hence, multiple unit operations of a different kind can be considered by the PPO agent. For instance, if the problem involves more than one unit operation, a hybrid PPO agent that is capable of making discrete decisions (e.g., type of units in a flowsheet) and continuous decisions (e.g., unit operation’s set-point) simultaneously can be designed. In this approach, the agent is tasked to interacting with the environment, i.e., adding unit operations to the flowsheet, until achieving the specified design and control goals. The dataset used to train the NNs is generated through a systematic process. Initially, a Latin hypercube is generated, encompassing all the model inputs, e.g., inlet stream variables, design variables, and control variables. Subsequently, every combination within this preliminary database is assessed within the dynamic process model to gather the corresponding outlet parameters and variables, e.g., species concentration and temperature of the outlet streams, along with a control tracking parameter and constraint tracking parameters. Following this evaluation, input/output data gathered from these simulations is tailored to fit a set of NNs (i.e., surrogate models) that are then integrated into the environment. A primary challenge in this methodology involved converting a transient profile into a singular explanatory value usable by the RL agent, given that all output variables in the dynamic process models are time-dependent. To address this issue, performance metrics borrowed from control theory were adopted in this work. The control tracking performance was assessed through the Integral of the Squared Error (ISE). The constraint tracking parameter was also assessed using the ISE; however, when translated into sub-rewards terms, it was accounted for as a penalty if it fell outside a pre-specified threshold. In cases where the interaction between the agent and the environment concludes with even minor violations in the constraint tracking parameter (resembling a constraint programming approach), it may result in suboptimal learning rates for the agent. Hence, this strategy was selected to encourage exploration by the agent and promote constraint satisfaction. The integration of these parameters showed an effective way to capture tracking errors and constraint violations in closed-loop, facilitating its utilization by the RL agent.
The RL methodology was applied to a case study aimed to perform simultaneous design and control of a process flowsheet for a first-order reaction subject to a step disturbance in the main inlet stream. The only unit operation available to the RL agent was a set of continuously stirred tank reactor (CSTR), whose thermodynamic parameters and dynamic model were adapted from the literature [6]. This case study was implemented on a PC with Intel® CoreTM i7-3770 CPU @ 3.40 GHz and 32 GB of RAM, using Python as the programming language. The main libraries used were torch for the development of the neural networks and gym for creating the RL environment. The objective of the agent was to simultaneously design and control the number of reactors required to achieve a 95% conversion of reactant, smoothly reject the inlet flow disturbance, and ensure that the reaction temperature remained below 400 K to prevent a runaway reaction within any of the CSTRs that may be included in the process flowsheet. Each CSTR considered in the flowsheet included a PI-controller that regulated the outlet concentration by adjusting the jacket’s temperature. The dataset used to train the NNs embedded in the environment was generated using a Latin hypercube of 60,000 points on all the inlet variables for the reactor’s model, i.e., inlet concentration and temperature, set-point concentration, volume, and the controllers’ proportional gain and time integral. For this case study, the part that demanded the most computational resources was the identification of the surrogate NNs models, i.e., the simulation of the combinations included within the Latin hypercube design space. Once the dataset was completed with simulations of the dynamic CSTR, the output concentration and temperature, ISE, and temperature constraint tracking were adjusted for 3 regression NNs and one classification NN, respectively. The classification NN determined whether the operating temperature of the CSTR was within the feasible limits. The resulting NNs architecture was then embedded within the RL’s environment. The PPO agent was trained for 100,000 steps over the course of one hour. The flowsheet obtained by the agent consisted of two reactors, which successfully dampened the inlet low disturbance and achieved a 95% conversion of the reactant in the second reactor. Outcomes achieved by the RL agent were compared against a model-based optimization methodology. Although the resolution process for the proposed framework may take longer, this methodology may be attractive to consider more complex scenarios involving multiple units of different kinds modelled using rigorous thermodynamic correlations and dynamic conservation balance equations. In such instances, approximating these models through NNs and subsequently optimizing them using the proposed RL methodology presents a highly attractive advantage over model-based optimization methods. This approach offers flexibility and adaptability in tackling intricate problems where traditional modeling approaches may fall short, showcasing the potential of RL in effectively addressing the simultaneous generation, design, and control of chemical flowsheets.
References
[1] Ricardez-Sandoval, L. A., Douglas, P. L., & Budman, H. M. (2011). A methodology for the simultaneous design and control of large-scale systems under process parameter uncertainty. Computers & Chemical Engineering, 35(2), 307–318. https://doi.org/10.1016/j.compchemeng.2010.05.010
[2] Koller, R. W., Ricardez-Sandoval, L. A., & Biegler, L. T. (2018). Stochastic back-off algorithm for simultaneous design, control, and scheduling of multiproduct systems under uncertainty. AIChE Journal, 64(7), 2379–2389. https://doi.org/10.1002/AIC.16092
[3] Rafiei, M., & Ricardez-Sandoval, L. A. (2020c). Integration of design and control for industrial scale applications under uncertainty: a trust region approach. Computers & Chemical Engineering, 141, 107006. https://doi.org/10.1016/J.COMPCHEMENG.2020.107006
[4] Sachio, S., del-Rio Chanona, A. E., & Petsagkourakis, P. (2021). Simultaneous Process Design and Control Optimization using Reinforcement Learning. IFAC-PapersOnLine, 54(3), 510–515. https://doi.org/10.1016/J.IFACOL.2021.08.293
[5] Mendiola-Rodriguez, T. A., & Ricardez-Sandoval, L. A. (2022). Robust control for anaerobic digestion systems of Tequila vinasses under uncertainty: A Deep Deterministic Policy Gradient Algorithm. Digital Chemical Engineering, 3, 100023. https://doi.org/10.1016/J.DCHE.2022.100023
[6] Seborg, D. E., T. F. Edgar, and D. A. Mellichamp, Process Dynamics and Control, 2nd Edition, Wiley, 2004, pp. 34–36 and 94–95.