Breadcrumb
- Home
- Publications
- Proceedings
- 2025 AIChE Annual Meeting
- Computing and Systems Technology Division
- 10E Data-Driven and Hybrid Modeling for Decision Making I
- (13b) Leveraging Qualitative Information to Improve Hybrid Models
Mechanistic modeling (also referred to as first-principle modeling) exploits detailed or phenomenological understanding of physics, chemistry, and biology to obtain a mathematical representation of a manufacturing process. On the other hand, data-driven modeling (i.e., machine learning and artificial intelligence) leverage large datasets to provide a model for a given purpose, e.g., prediction of the product quality. Data-driven models have become increasingly common in the chemical process industry [1,2]. Mechanistic and data driven modeling offer somewhat complementary benefits and drawbacks. Hybrid modeling [3] combines the two paradigms to take the best of both worlds: compared to purely data-driven models, hybrid models offer better interpretability, increased extrapolation performance, and reduced data requirements for training, yet maintaining a reasonably simple development workflow as compared to purely mechanistic models.
Hybrid modeling and, more broadly, the integration of process knowledge and data can be achieve in several ways: using mechanistic and data-driven elements as “blocks” in a single model [4]; augmenting the dataset for data-driven modeling by designing physically meaningful variables [5]; modifying the structure of data-driven models to reflect physical laws (e.g., conservation of mass) [6]; and using mechanistic models as constraints in the data-driven model training [7].
All the aforementioned approaches to hybrid modeling rely on the underlying assumption that a mechanistic model is available or can be easily derived from the available data. If this is not the case, a mechanistic model needs to be derived prior or simultaneously to the inclusions of the data-driven model. However, this could entail significant investment of time and resources, plus the need for tailored parameter identification procedures [8]. An alternative solution is to exploit another form of domain knowledge: qualitative information.
Qualitative domain knowledge is often available at little to no cost, as it only requires an understanding of the fundamentals of the process being modeled. Examples include bounds of variables, monotonicity of the relationship among variables, and information on causality. The use of qualitative information has shown significant benefits in process monitoring [9,10]. However, this precious form of knowledge remains overlooked in the hybrid modeling literature, with very few studies considering it [11,12]. Yet, qualitative information has the potential to improve the quality of models derived from process data, especially when data are sparse or offer only partial coverage of the domain of interest, which is the typical case of data from industrial processes [13].
In this study, we explore the potential of including qualitative knowledge in the hybrid modeling paradigm. We first provide several examples of qualitative information typically available in chemical engineering modeling tasks. We then discuss possible ways to use qualitative knowledge to improve data-driven models. Finally, we present two examples of hybrid models of chemical processes, e.g., a pH neutralization process and catalytic reaction process, to illustrate the benefits of including qualitative information.
References
[1] Reis MS, Saraiva PM. Data-centric process systems engineering: A push towards PSE 4.0. Computers & Chemical Engineering. 2021; 155: 107529.
[2] Romagnoli JA, Briceno-Mena LA, Manee V. AI in Chemical Engineering. CRC Press 2025.
[3] Stosch M, Oliveira R, Peres J, Azevedo Sa. Hybrid semi-parametric modeling in process systems engineering: Past, present and future. Computers & Chemical Engineering. 2014; 60: 86–101.
[4] Oliveira R. Combining first principles modelling and artificial neural networks: A general framework. Computers & Chemical Engineering. 2004; 28: 775–766.
[5] Severson KA, Attia PM, Jin N, Perkins N, Jiang B, Yang Z, Chen MH, Aykol M, Herring PK, Fraggedakis D, Bazant MZ, Harris SJ, Chueh WC, Braatz RD. Data-driven prediction of battery cycle life before capacity degradation. Nature Energy. 2019; 4: 383–391.
[6] Beucler T, Pritchard M, Rasp S, Ott J, Baldi P, Gentine P. Enforcing analytic constraints in neural- networks emulating physical systems. Physical Review Letters. 2021; 126: 098302.
[7] Raissi M, Perdikaris P, Karniadakis GE. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics. 2019; 378: 686–707.
[8] Yang A, Martin E, Morris J. Identification of semi-parametric hybrid process models. Computers & Chemical Engineering. 2011; 53: 63–70.
[9] Chiang LH, Braatz RD. Process monitoring using causal map and multivariate statistics: fault detection and identification. Chemometrics and Intelligent Laboratory Systems. 2003; 65: 159–178.
[10] Reis MS, Gins G, Rato TJ. Incorporation of process-specific structure in statistical process monitoring: A review. Journal of Quality Technology. 2019; 51: 407–421.
[11] Muralidhar N, Islam MR, Marwah M, Karpatne A, Ramakrishnan N. Incorporating prior domain knowledge into deep neural networks. in IEEE International Conference on Big Data (Big Data): 36–45 2018.
[12] Daw A, Karpatne A, Watkins W, Read J, Kumar V. Physics-guided neural networks (PGNN): An application in lake temperature modeling. arXiv. 2021.
2[13] Thebelt A, Wiebe J, Kronqvist J, Tsay C, Misener R. Maximizing information from chemical engineering data sets: Applications to machine learning. Chemical Engineering Science. 2022; 252; 117469.
Funding Acknowledgements
Financial support from The Dow Chemical Company is acknowledged.