2025 AIChE Annual Meeting

(259f) Physics-Informed Model-Based Policy Optimization of Koopman Economic NMPC Policies

Authors

Alexander Mitsos - Presenter, RWTH Aachen University
Manuel Dahmen, FZ Jülich
Data-driven dynamic models present a promising avenue for rendering (economic) nonlinear model predictive control ((e)NMPC) tractable [1] for complex processes where (i) no mechanistic model is available, or (ii) a mechanistic model is available but cannot be used as part of a real-time capable (e)NMPC policy [2]. While system identification (SI) is the most common approach to training data-driven dynamic models, it narrowly focuses on maximizing average prediction accuracy. Reinforcement learning (RL) offers an alternative or complementary method to SI: It can be used to tune (e)NMPC policies for optimal performance in a specific control task by optimizing the dynamic model [3,4,5] or parameters in the policy’s objective function or constraints [5,6], e.g., state bounds. However, standard RL algorithms are notoriously sample-inefficient, hindering their usage when the number of interactions with the control system for learning purposes is limited [7].

We present a novel approach [8] for sample-efficient RL-based (e)NMPC policy learning in process control by combining a model-based RL algorithm [9] with our previously published method [4] that turns Koopman (e)NMPC policies into automatically differentiable policies. When applied to an eNMPC case study of a continuous stirred-tank reactor (CSTR) model from the literature [10], the approach outperforms benchmark methods, i.e., data-driven eNMPC policies using models based on system identification without further RL tuning of the resulting policy, and neural network controllers trained with model-based RL, by achieving superior control performance and higher sample efficiency [8]. Furthermore, utilizing partial prior knowledge about the system dynamics via physics-informed learning [11,12] further increases sample efficiency [8].

Our work integrates modern methods that increase the sample efficiency of dynamic model learning [10,11] and RL [9] with learning task-optimal data-driven (e)NMPC policies. Thus, it is a step toward making RL-based learning of predictive controllers feasible for complex real-world process control problems where no simulator of the environment is available a priori and learning by interacting with the process is expensive.

[1] Tang, W., & Daoutidis, P. (2022). Data-driven control: Overview and perspectives. In 2022 American Control Conference (ACC), 1048-1064.

[2] McBride, K., & Sundmacher, K. (2019). Overview of surrogate modeling in chemical process engineering. Chemie Ingenieur Technik, 91(3), 228-239.

[3] Chen, B., Cai, Z., & Bergés, M. (2019). GNU-RL: A precocial reinforcement learning solution for building HVAC control using a differentiable MPC policy. In Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, 316-325.

[4] Mayfrank, D., Mitsos, A., & Dahmen, M. (2024). End-to-end reinforcement learning of Koopman models for economic nonlinear model predictive control. Computers & Chemical Engineering, 190, 108824.

[5] Gros, S., & Zanon, M. (2019). Data-driven economic NMPC using reinforcement learning. IEEE Transactions on Automatic Control, 65(2), 636-648.

[6] Brandner, D., Talis, T., Esche, E., Repke, J. U., & Lucia, S. (2023). Reinforcement learning combined with model predictive control to optimally operate a flash separation unit. In Computer Aided Chemical Engineering (52), 595-600.

[7] Gopaluni, R. B., Tulsyan, A., Chachuat, B., Huang, B., Lee, J. M., Amjad, F., Damarla, S. K., Kim, J. W., and Lawrence, N. P. (2020). Modern machine learning tools for monitoring and control of industrial processes: A survey. IFAC-PapersOnLine, 53(2), 218–229.

[8] Mayfrank, D., Velioglu, M., Mitsos, A., & Dahmen, M. (2025). Sample-Efficient Reinforcement Learning of Koopman eNMPC. arXiv preprint arXiv:2503.18787.

[9] Janner, M., Fu, J., Zhang, M., and Levine, S. (2019). When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 32, 12498–12509.

[10] Flores-Tlacuahuac, A., & Grossmann, I. E. (2006). Simultaneous cyclic scheduling and control of a multiproduct CSTR. Industrial & Engineering Chemistry Research, 45(20), 6698-6712.

[11] Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686-707.

[12] Antonelo, E. A., Camponogara, E., Seman, L. O., Jordanou, J. P., de Souza, E. R., & Hübner, J. F. (2024). Physics-informed neural nets for control of dynamical systems. Neurocomputing, 579, 127419.