2024 AIChE Annual Meeting
(578d) Variational Inference for Semi-Parametric “Hybrid” Models
The Bayesian approach to solving inverse problems (i.e., Bayesian inversion) can help regularize any ill-posedness present. Moreover, Bayesian inversion estimates the unknown parameters of a mathematical model by updating the practitioner’s prior knowledge of the parameters with the measured data through Bayes’ theorem. The solution is a conditional probability distribution of the inputs given the data called the posterior. The posterior distribution represents the updated probability of observing a set of parameters given the observed data, providing a more robust and well-posed solution.[2, 3] Nevertheless, this computation is often infeasible, and one must resort to approximate inference.
Historically, the dominant paradigm for approximate inference has been Markov Chain Monte Carlo (MCMC). Some landmark developments in this space include the development of the Metropolis-Hastings[4, 5] and Gibbs sampling[6] algorithms, with applications to Bayesian statistics.[7] Recent developments include Hamiltonian Monte Carlo,[8] the No-U-Turn Sampler (NUTS),[9] and the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm.[10, 11] Although robust, standard Monte Carlo integration’s N-1/2 convergence rate[12] can be excessive for large datasets, complex models, and real-time decision-making. Additionally, without carefully tuned hyperparameters, sampling-based algorithms have shown sub-optimal performance for multimodal or degenerate posterior distributions, and convergence is not always easy to assess.[13] Thus, there is a need for new paradigms for posterior approximation for expensive computational models and non-identifiable inference tasks.
Variational inference (VI) is a promising alternative to sampling-based approaches such as MCMC. Furthermore, VI utilizes numerical optimization to determine the member of a parametric family of distributions closest to a desired posterior, typically measured by the Kullback-Leibler divergence.[14] Popular scalable approaches to VI include mean-field and structured VI,[15, 16] yet these approaches cannot guarantee the approximation of complicated densities. To this end, normalizing flow VI can balance the tradeoff between accurate MCMC and speedy VI approaches. The main idea behind normalizing flows is to transform a simple base distribution into a (usually more complex) target distribution via a collection of diffeomorphisms. The primary advantage of normalizing flows is their ability to recover multimodal target distributions with complex dependencies.[17] In prior work, Wang et al.[18] and Cobian et al.[19] combined VI and normalizing flow with an adaptively trained surrogate (NoFAS) to mitigate the computational cost of sampling posterior distributions generated by expensive computational models. Additionally, they proposed an adaptive annealing strategy to simplify the sampling process from complicated posterior distributions induced, for example, by significant dependence among the inputs.
This work extends the NoFAS framework to semi-parametric Kennedy and O’Hagan models.[20] Kennedy and O’Hagan models are ”hybrid” models composed of a parametric phenomenological model and a nonparametric discrepancy term that quantifies bias between the phenomenological model and experimental data. Thus, we provide a variational approach to inverting phenomenological models that are misspecified in functional form. We demonstrate the performance of the proposed approach for globally and locally non-identifiable Langmuir adsorption models and compare the results with MCMC. Moreover, we explore the tradeoff between scalability and accuracy for approximating posterior distributions concentrated around a lower-dimensional manifold embedded within the parameter space. Finally, we implement this extension in the Python Library for Inference with Normalizing Flow and Annealing (LINFA).[21]
[1] H. W. Engl and R. Ramlau, “Regularization of Inverse Problems,” in Encyclopedia of Applied and Computational Mathematics (B. Engquist, ed.), pp. 1233–1241, Berlin, Heidelberg: Springer, 2015.
[2] J. Kaipio and E. Somersalo, “Statistical Inversion Theory,” in Statistical and Computational Inverse Problems, vol. 160 of Applied Mathematical Sciences, pp. 49–114, New York, NY: Springer, 2005.
[3] M. Iglesias and A. M. Stuart, “UQ and a Model Inverse Problem,” SIAM News, vol. 47, no. 6, 2014.
[4] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, “Equation of State Calculations by Fast Computing Machines,” The Journal of Chemical Physics, vol. 21, no. 6, pp. 1087–1092, 1953.
[5] W. K. Hastings, “Monte Carlo sampling methods using Markov chains and their applications,” Biometrika, vol. 57, no. 1, pp. 97–109, 1970.
[6] S. Geman and D. Geman, “Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-6, no. 6, pp. 721–741, 1984.
[7] A. E. Gelfand and A. F. M. Smith, “Sampling-Based Approaches to Calculating Marginal Densities,” Journal of the American Statistical Association, vol. 85, no. 410, pp. 398–409, 1990.
[8] R. Neal, “MCMC Using Hamiltonian Dynamics,” in Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng, eds.), ch. 6, Chapman and Hall/CRC, 2011.
[9] M. D. Hoffman and A. Gelman, “The No-U-turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo.,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1593–1623, 2014.
[10] J. A. Vrugt, C. J. F. ter Braak, C. G. H. Diks, B. A. Robinson, J. M. Hyman, and D. Higdon, “Accelerating Markov Chain Monte Carlo Simulation by Differential Evolution with Self-Adaptive Randomized Subspace Sampling,” International Journal of Nonlinear Sciences and Numerical Simulation, vol. 10, no. 3, pp. 273–290, 2009.
[11] J. A. Vrugt, “Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation,” Environmental Modelling & Software, vol. 75, pp. 273–316, 2016.
[12] R. E. Caflisch, “Monte Carlo and quasi-Monte Carlo methods,” Acta Numerica, vol. 7, p. 1–49, 1998.
[13] V. Roy, “Convergence diagnostics for Markov chain Monte Carlo,” Annual Review of Statistics and Its Application, vol. 7, pp. 387–412, 2020.
[14] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe, “Variational Inference: A Review for Statisticians,” Journal of the American Statistical Association, vol. 112, no. 518, pp. 859–877, 2017.
[15] L. K. Saul and M. I. Jordan, “Exploiting Tractable Substructures in Intractable networks,” in Advances in Neural Information Processing Systems (D. Touretzky, M. Mozer, and M. Hasselmo, eds.), vol. 8, MIT Press, 1995.
[16] D. Barber and W. Wiegerinck, “Tractable Variational Structures for Approximating Graphical Models,” in Advances in Neural Information Processing Systems (M. Kearns, S. Solla, and D. Cohn, eds.), vol. 11, MIT Press, 1998.
[17] S. Premchandar, B. Shrijita, and M. Tapabrata, “Normalizing Flows Aided Variational Inference: A Useful Alternative to MCMC?,” Notices of the American Mathematical Society, vol. 70, no. 7, pp. 1059–1060, 2023.
[18] Y. Wang, F. Liu, and D. E. Schiavazzi, “Variational inference with NoFAS: Normalizing flow with adaptive surrogate for computationally expensive models,” Journal of Computational Physics, vol. 467, p. 111454, 2022.
[19] E. R. Cobian, J. D. Hauenstein, F. Liu, and D. E. Schiavazzi, “Adaann: Adaptive annealing scheduler for probability density approximation,” International Journal for Uncertainty Quantification, vol. 13, no. 3, 2023.
[20] M. C. Kennedy and A. O’Hagan, “Bayesian calibration of computer models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 63, no. 3, pp. 425–464, 2001.
[21] Y. Wang, E. R. Cobian, J. Lee, F. Liu, J. D. Hauenstein, and D. E. Schiavazzi, “LINFA: a Python library for variational inference with normalizing flow and annealing,” Journal of Open Source Software, vol. 9, no. 96, p. 6309, 2024.