2025 AIChE Annual Meeting

(326f) Application of Safe Reinforcement Learning to Chemical Engineering Problems

Chemical engineers often face sequential decision-making challenges under uncertainty, such as optimizing plant operations, resource allocation, or capacity planning. A common approach is to formulate stochastic programming models that approximate uncertain parameters using a set of scenarios (Li & Grossmann, 2021). However, these models tend to be large and NP-hard, meaning that incorporating more scenarios improves accuracy and robustness but also increases computational difficulty. This inherent tradeoff makes it challenging to quickly generate high-quality solutions.

An alternative way to address these problems is by modelling them within a reinforcement learning (RL) environment (Sutton & Barto, 2018). RL provides a structured approach to solving sequential decision-making problems by allowing a decision-making agent to learn through interaction with an environment (Hedrick et al., 2022; Reynoso-Donzelli & Ricardez-Sandoval, 2025; Shin & Lee, 2019). In RL, the agent takes actions, observes the outcomes, and receives rewards based on performance, gradually improving its strategy over time. RL explores different possibilities and adapts dynamically to changing conditions. Once trained, RL agents can make rapid decisions, often in polynomial time, and can efficiently handle a larger number of scenarios during off-line training without a significant proportional increase in computational cost.

However, the disadvantage of deep RL methods using neural networks as compared to using optimization models is in dealing with hard and complex constraints. Conventional deep reinforcement learning algorithms often struggle with feasibility when complex constraints are involved, which can hinder their effectiveness. This is because neural networks, by their nature do not handle complex hard constraints. This challenge is particularly relevant in chemical engineering, where problems are governed by various constraints such as mass balance, property balance, and non-linear stoichiometric relationships. To overcome these limitations, Safe Reinforcement Learning (SafeRL) has emerged as a promising alternative. SafeRL incorporates explicit safety constraints into the learning process, guiding agents toward behavior that minimizes risks and prevents unsafe actions (García et al., 2015). By integrating constraint-aware mechanisms, SafeRL enables agents to learn optimal policies that respect both performance and safety requirements, thereby producing more reliable, risk-aware decisions. This evolving field is rapidly gaining attention as it addresses the critical need for feasibility and safety in environments where conventional RL methods might otherwise struggle.

In this work, we develop RL environments tailored to key chemical engineering challenges—including the multi-period blending problem, multi-echelon supply chain, capacity expansion, unit commitment problem, air separation unit, state task network and resource task network. We then deploy several SafeRL algorithms to train agents within these environments, evaluating their effectiveness in managing various constraints and benchmarking their performance against full-space stochastic models. For implementation, we leverage Omnisafe (Ji et al., 2024), an infrastructural framework that accelerates safe reinforcement learning research. Omnisafe offers a robust benchmark for SafeRL algorithms and an out-of-the-box modular toolkit, enabling a systematic comparison of different approaches in constraint-rich settings.

García, J., Fern, & o Fernández. (2015). A Comprehensive Survey on Safe Reinforcement Learning. Journal of Machine Learning Research, 16(42), 1437–1480. http://jmlr.org/papers/v16/garcia15a.html

Hedrick, E., Hedrick, K., Bhattacharyya, D., Zitney, S. E., & Omell, B. (2022). Reinforcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit. Computers & Chemical Engineering, 160, 107727. https://doi.org/10.1016/j.compchemeng.2022.107727

Ji, J., Zhou, J., Zhang, B., Dai, J., Pan, X., Sun, R., Huang, W., Geng, Y., Liu, M., & Yang, Y. (2024). OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research. Journal of Machine Learning Research, 25(285), 1–6. http://jmlr.org/papers/v25/23-0681.html

Li, C., & Grossmann, I. E. (2021). A Review of Stochastic Programming Methods for Optimization of Process Systems Under Uncertainty. Frontiers in Chemical Engineering, 2. https://doi.org/10.3389/fceng.2020.622241

Reynoso-Donzelli, S., & Ricardez-Sandoval, L. A. (2025). An integrated reinforcement learning framework for simultaneous generation, design, and control of chemical process flowsheets. Computers & Chemical Engineering, 194, 108988. https://doi.org/10.1016/j.compchemeng.2024.108988

Shin, J., & Lee, J. H. (2019). Multi-timescale, multi-period decision-making model development by combining reinforcement learning and mathematical programming. Computers & Chemical Engineering, 121, 556–573. https://doi.org/https://doi.org/10.1016/j.compchemeng.2018.11.020

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.