2024 AIChE Annual Meeting

(732f) A Reinforcement Learning Framework for the Simultaneous Generation, Design and Control of Chemical Process Flowsheets

Authors

Reynoso-Donzelli, S. - Presenter, University of Waterloo
Ricardez-Sandoval, L., University of Waterloo
The integration and optimization of chemical process design and control systems hold a major significance in the field of engineering since it has the potential to improve the economic and sustainable operation of chemical process systems. The inherent complexities and challenges that arise when performing integration of design and control has prompted the search for more streamlined solutions. Traditionally, the sequential approach has been used to tackle optimal process design and control systems. In this framework, the steady-state problem aiming to determine equipment specifications and operations is addressed first followed by a controllability analysis aimed to assess the dynamic feasibility of the system in closed-loop. Despite its apparent effectiveness, this approach may not yield an optimal or even feasible transient operation since process control decisions are not considered at the design stage. Hence, there is a need for an integrated approach to design and control, ensuring the creation of a feasible plant flowsheets that meet design objectives consistently across both steady-state and transient phases of the plant in the presence of external disturbances [1]. A key challenge in chemical engineering applications involves the selection of integer process design decisions, e.g., number of trays in a distillation column (or the number of reactors) needed to achieve a particular product purity. Coupling integer design decisions with time-dependent process variables (e.g., systems’ states) gives rise to a complex problem, known as mixed-integer dynamic optimization (MIDO). Significant progress has been made in this field, showcasing promising outcomes [1] – [3]. Despite these efforts, the current applications are limited to a relatively small number of processing units due to the high computational costs associated with the solution of integrated problems. Amid the rise of Machine Learning techniques in process systems engineering, numerous challenges are currently being addressed through model-free optimization approaches, prominently utilizing Reinforcement Learning (RL) techniques. This trend extends to issues involving the integration of process design and control. Recognizing the compelling outcomes presented in [4] – [5], this study endeavors to delve into the application of RL techniques for solving integrated design and control problems that involve integer (process design/flowsheet) decisions.

This study introduces a framework capable to simultaneously generate, design and control chemical process flowsheets using RL. A Proximal Policy Optimizer (PPO) agent is used to interact with the environment. The agent’s primary goal is to simultaneously generate, design and control a chemical process flowsheet that can optimize a user-defined objective function, which also considers the process dynamic variability in closed-loop, e.g., disturbance rejection tracking errors, while adhering to the process and equipment operation and design constraints. Objectives and constraints are enforced using a reward shaping strategy. The agent interacts with an environment composed of different neural networks (NNs), which are identified a priori and aim to approximate the dynamic models that resemble the closed-loop operation of the unit operations included the integrated problem. The nature of the PPO agent depends on the action space of the integrated problem, i.e., if discrete, continuous or hybrid decisions are considered. Hence, multiple unit operations of a different kind can be considered by the PPO agent. For instance, if the problem involves more than one unit operation, a hybrid PPO agent that is capable of making discrete decisions (e.g., type of units in a flowsheet) and continuous decisions (e.g., unit operation’s set-point) simultaneously can be designed. In this approach, the agent is tasked to interacting with the environment, i.e., adding unit operations to the flowsheet, until achieving the specified design and control goals. The dataset used to train the NNs is generated through a systematic process. Initially, a Latin hypercube is generated, encompassing all the model inputs, e.g., inlet stream variables, design variables, and control variables. Subsequently, every combination within this preliminary database is assessed within the dynamic process model to gather the corresponding outlet parameters and variables, e.g., species concentration and temperature of the outlet streams, along with a control tracking parameter and constraint tracking parameters. Following this evaluation, input/output data gathered from these simulations is tailored to fit a set of NNs (i.e., surrogate models) that are then integrated into the environment. A primary challenge in this methodology involved converting a transient profile into a singular explanatory value usable by the RL agent, given that all output variables in the dynamic process models are time-dependent. To address this issue, performance metrics borrowed from control theory were adopted in this work. The control tracking performance was assessed through the Integral of the Squared Error (ISE). The constraint tracking parameter was also assessed using the ISE; however, when translated into sub-rewards terms, it was accounted for as a penalty if it fell outside a pre-specified threshold. In cases where the interaction between the agent and the environment concludes with even minor violations in the constraint tracking parameter (resembling a constraint programming approach), it may result in suboptimal learning rates for the agent. Hence, this strategy was selected to encourage exploration by the agent and promote constraint satisfaction. The integration of these parameters showed an effective way to capture tracking errors and constraint violations in closed-loop, facilitating its utilization by the RL agent.

The RL methodology was applied to a case study aimed to perform simultaneous design and control of a process flowsheet for a first-order reaction subject to a step disturbance in the main inlet stream. The only unit operation available to the RL agent was a set of continuously stirred tank reactor (CSTR), whose thermodynamic parameters and dynamic model were adapted from the literature [6]. This case study was implemented on a PC with Intel® CoreTM i7-3770 CPU @ 3.40 GHz and 32 GB of RAM, using Python as the programming language. The main libraries used were torch for the development of the neural networks and gym for creating the RL environment. The objective of the agent was to simultaneously design and control the number of reactors required to achieve a 95% conversion of reactant, smoothly reject the inlet flow disturbance, and ensure that the reaction temperature remained below 400 K to prevent a runaway reaction within any of the CSTRs that may be included in the process flowsheet. Each CSTR considered in the flowsheet included a PI-controller that regulated the outlet concentration by adjusting the jacket’s temperature. The dataset used to train the NNs embedded in the environment was generated using a Latin hypercube of 60,000 points on all the inlet variables for the reactor’s model, i.e., inlet concentration and temperature, set-point concentration, volume, and the controllers’ proportional gain and time integral. For this case study, the part that demanded the most computational resources was the identification of the surrogate NNs models, i.e., the simulation of the combinations included within the Latin hypercube design space. Once the dataset was completed with simulations of the dynamic CSTR, the output concentration and temperature, ISE, and temperature constraint tracking were adjusted for 3 regression NNs and one classification NN, respectively. The classification NN determined whether the operating temperature of the CSTR was within the feasible limits. The resulting NNs architecture was then embedded within the RL’s environment. The PPO agent was trained for 100,000 steps over the course of one hour. The flowsheet obtained by the agent consisted of two reactors, which successfully dampened the inlet low disturbance and achieved a 95% conversion of the reactant in the second reactor. Outcomes achieved by the RL agent were compared against a model-based optimization methodology. Although the resolution process for the proposed framework may take longer, this methodology may be attractive to consider more complex scenarios involving multiple units of different kinds modelled using rigorous thermodynamic correlations and dynamic conservation balance equations. In such instances, approximating these models through NNs and subsequently optimizing them using the proposed RL methodology presents a highly attractive advantage over model-based optimization methods. This approach offers flexibility and adaptability in tackling intricate problems where traditional modeling approaches may fall short, showcasing the potential of RL in effectively addressing the simultaneous generation, design, and control of chemical flowsheets.

References

[1] Ricardez-Sandoval, L. A., Douglas, P. L., & Budman, H. M. (2011). A methodology for the simultaneous design and control of large-scale systems under process parameter uncertainty. Computers & Chemical Engineering, 35(2), 307–318. https://doi.org/10.1016/j.compchemeng.2010.05.010

[2] Koller, R. W., Ricardez-Sandoval, L. A., & Biegler, L. T. (2018). Stochastic back-off algorithm for simultaneous design, control, and scheduling of multiproduct systems under uncertainty. AIChE Journal, 64(7), 2379–2389. https://doi.org/10.1002/AIC.16092

[3] Rafiei, M., & Ricardez-Sandoval, L. A. (2020c). Integration of design and control for industrial scale applications under uncertainty: a trust region approach. Computers & Chemical Engineering, 141, 107006. https://doi.org/10.1016/J.COMPCHEMENG.2020.107006

[4] Sachio, S., del-Rio Chanona, A. E., & Petsagkourakis, P. (2021). Simultaneous Process Design and Control Optimization using Reinforcement Learning. IFAC-PapersOnLine, 54(3), 510–515. https://doi.org/10.1016/J.IFACOL.2021.08.293

[5] Mendiola-Rodriguez, T. A., & Ricardez-Sandoval, L. A. (2022). Robust control for anaerobic digestion systems of Tequila vinasses under uncertainty: A Deep Deterministic Policy Gradient Algorithm. Digital Chemical Engineering, 3, 100023. https://doi.org/10.1016/J.DCHE.2022.100023

[6] Seborg, D. E., T. F. Edgar, and D. A. Mellichamp, Process Dynamics and Control, 2nd Edition, Wiley, 2004, pp. 34–36 and 94–95.