2024 AIChE Annual Meeting
(458b) Physics-Informed Automated Discovery of Kinetic Models
Authors
- Introduction
Kinetic models are important for the design, optimization, and control of chemical processes, but their development is difficult. Traditional white box models are accurate, but their construction is time-consuming and require vast expertise. While more flexible alternatives like hybrid or black box models need large datasets and lack interpretability, posing risks in safety-critical applications.
Symbolic regression, a machine learning approach that derives mathematical models directly from data without predefined equations, is gaining popularity for its ability to uncover patterns and facilitate scientific discoveries, particularly in kinetic modeling [1]. Yet, existing symbolic regression methods like SINDy and ALAMO face limitations, such as reliance on accurate prior knowledge and vulnerability to noisy data.
The ADoK-S and ADoK-W frameworks introduced by de Carvalho Servia et al. [2] advanced the field but highlighted the need for further improvements, including expert knowledge integration and uncertainty quantification. This study enhances the ADoK-S framework with these features, proposing physics-informed ADoK-S (PI-ADoK-S). The importance of these improvements is twofold. Firstly, providing an avenue for prior knowledge injection may reduce the enormous search space inherent to symbolic regression – reducing the experimental burden and creating a speed-up in scientific discovery – whilst also ensuring physically reasonable models. Secondly, providing measures of the uncertainty of a model’s predictions offers valuable information regarding the reliability of the model.
- Methodology
Developed in response to the shortcomings of conventional modelling methods, ADoK-S positioned itself as an alternative to SINDy and ALAMO by providing solutions to some of their shortcomings. The ADoK-S framework is composed of three main components: a genetic programming (GP) approach for creating models, a sequential optimization method for optimizing model parameters, and an Akaike Information Criterion (AIC)-based model selection procedure.
In the original ADoK-S, kinetic rate models are derived from reaction rate estimations. To find the most appropriate kinetic model, the procedure starts with identifying optimal concentration profiles. From these, rate estimates are derived by numerical differentiation. Then, with the rate estimates, kinetic rate models are proposed and compared to the original dataset. When results are not satisfactory, ADoK-S utilizes a closed-loop strategy that finds optimal discriminatory experiments using Model-Based Design of Experiments (MBDoE) that are utilized to refine the output models of ADoK-S. This process is repeated until resources are exhausted or the required accuracy is reached.
Despite its achievements, ADoK-S's drawbacks were discussed, most notably its inability to incorporate prior knowledge, and the absence of quantifying and propagating parametric uncertainty. These realizations prompted the development of PI-ADoK-S, an improved version of ADoK-S that fills in these shortcomings. By introducing methods for determining prediction uncertainty and directly incorporating prior knowledge, PI-ADoK-S considerably improves the reliability and application potential of the framework. Additional details about PI-ADoK-S and its procedure are provided, along with visual aids for a thorough comprehension (see attachment).
- Mathematical Constraint Inclusion
There are conflicting results in the discussion of adding mathematical constraints to symbolic regression. Some studies, such as Kronberger et al. [3], constraints, especially in low-noise situations, may increase prediction errors because they cause slower convergence and/or decreased diversity. However, in noisy or small datasets, Haider et al. [4] and Błądek and Krawiec [5] demonstrate the advantages of constraints, enhancing model accuracy without causing significant side effects. These contradictions reflect the difficulties in incorporating prior knowledge into symbolic regression effectively; PI-ADoK-S seeks to address this difficulty.
By penalizing constraint violations in the evaluation of model performance, PI-ADoK-S ensures that the search for rate models is neither too lenient, which can result in implausible models, nor too stringent, which can result in suboptimal models. The approach is purposefully straightforward. If a model satisfies the constraints, its performance metric is simply its prediction error. In contrast, if a model fails to meet a constraint, its performance metric is penalized based on the degree of constraint violation controlled by a hyperparameter. This inclusion of hyperparameters allows the modeler to fine-tune the algorithm based on their specific application.
- Uncertainty Quantification
There are many different approaches for quantifying uncertainty, ranging from straightforward approximations like sigma points and Laplace approximations to intricate sampling algorithms like Hamiltonian Monte Carlo and Metropolis-Hastings (MH). The decision hinges on striking a balance between accuracy and computing efficiency. We have selected the MH algorithm because it works well for our needs, where the accurate propagation of uncertainty in kinetic models is of utmost importance in safety-critical application. MH is flexible enough to handle proposal distributions in high-dimensional spaces and complex models, as it does not require gradient information and performs well with complex, non-linear distributions. Although MH requires careful tuning and is computationally costly, its convergence and adaptability make it a reliable option for quantifying uncertainty in complex systems. See Chib and Greenberg [6] for additional information on MH.
- Case Study
The case study used for the performance analysis of PI-ADoK-S is the catalytic toluene hydrodealkylation to benzene, where toluene (C6H5CH3) and hydrogen gas (H2) is transformed to benzene (C6H6) and methane (CH4). The kinetic rate model that describes the evolution of the concentrations of C6H5CH3, H2, C6H6 and CH4 through time can be found in Fogler [7].
The kinetic parameters of the kinetic rate model are represented by Ki where i ∈ [A, B, C]. The computational experiments are run with the following initial conditions (in molar units): (CT(t = 0), CH(t = 0), CB(t = 0), CM(t = 0)) ∈ {(1, 8, 2, 3), (5, 8, 0, 0.5), (5, 3, 0, 0.5), (1, 3, 0, 3), (1, 8, 2, 0.5)}; where CT, CH, CB and CM represent the concentration of reactants toluene and hydrogen, and of products benzene and methane, respectively. For each experiment, the concentration of the reactants and products are recorded 15 times, at evenly spaced intervals between time t0 = 0 h and tf = 10 h. The kinetic parameters were defined as: KA = 2 M-1 h-1, KB = 9 M-1 and KC = 5 M-1.
Gaussian noise is added to the in-silico measurements to simulate a realistic chemical system. The added noise had zero mean and a standard deviation of 0.2 for CT, CH, CB and CM. This noise addition allows the approximation of the response of a real system.
- Results and Discussion
The primary goal was to determine whether incorporating constraints could improve the performance of the PI-ADoK-S algorithm. Constraints were applied at two key stages: derivative estimation and rate model estimation. For derivative estimation, constraints ensured that concentration models adhered to initial conditions, simulated equilibrium, avoided negative concentrations, and maintained that reactants’ (products’) concentrations always decreased (increased). In rate model estimation, constraints guaranteed the correct sign and trend for the rates of reactant consumption and product formation.
An initial benchmarking study between the unconstrained ADoK-S and the constrained PI-ADoK-S, showed that PI-ADoK-S required 43.75% fewer experiments to discover the true model, highlighting its efficiency. Despite this improvement, the stochasticity inherent in GP suggests the need for further validation through additional trials to make stronger conclusions (currently underway). Uncertainty quantification revealed a slight bias in parameter distribution, potentially due to dataset noise or issues with parameter identifiability (see attachment). This suggests that the model could fit the data within uncertainty bounds through multiple parameter combinations, pointing to the complexity of accurately capturing system dynamics under uncertainty and limited data.
- Conclusions
This study introduces PI-ADoK-S, an advancement of the ADoK-S algorithm, featuring the integration of mathematical constraints for prior knowledge injection and an uncertainty quantification method using the MH algorithm. The inclusion of constraints streamlines the GP algorithm, significantly reducing the experimental burden by 43.75%, though further testing is required for robust validation. The uncertainty quantification aspect enhances the understanding of the reliability of a certain model’s predictions, which can be used to guide future investigative efforts. These improvements underscore PI-ADoK-S as a significant upgrade over ADoK-S, both in terms of efficiency and depth of analysis, marking it as a valuable tool for experts in kinetic modeling and chemical process research.
References
[1] Neumann P, Cao L, Russo D, Vassiliadis VS, Lapkin AA. A new formulation for symbolic regression to identify physico-chemical laws from experimental data. J Chem Eng. 2020 May;387:123412.
[2] M. Á. de Carvalho Servia, I. O. Sandoval, K. K. Hii, K. Hellgardt, D. Zhang, E. A. del Rio Chanona, 2023, The Automtaed Discovery of Kinetic Rate Models – Methodological Frameworks, arXiv
[3] G. Kronberger, F. O. de Franca, B. Burlacu, C. Haider, M. Kommenda, 2022, Shape-Constrained Symbolic Regression – Improving Extrapolation with Prior Knowledge, Evolutionary Computation, 30, 1, 75-98
[4] C. Haider, F. O. de Franca, B. Burlacu, G. Kronberger, 2023, Shape-constrained multi-objective genetic programming for symbolic regression, Applied Soft Computing, 132, 109855
[5] I. Błądek, K. Krawiec, 2019, Solving symbolic regression problems with formal constraints, GECCO ’19: Proceedings of the Genetic and Evolutionary Computation Conference, 977-984
[6] S. Chib, E. Greenberg, 1995, Understanding the Metropolis-Hastings Algorithm, The American Statistician, 49, 4, 327-335
[7] H. S. Fogler, 2016, Elements of Chemical Reaction Engineering, 5th Edition