2006 AIChE Annual Meeting
(642d) Practical Challenges in Bayesian Modeling and Elicitation of Probabilistic Information
Authors
However, there are some practical issues preventing the wide use of BLVR. The original BLVR solves an optimization problem by nonlinear programming (NLP), which is time consuming, particularly for a large number of variables. Also, this optimization-based BLVR can only provide a point estimate, and lacks the ability to readily provide uncertainty information. To overcome these problems, a sampling-based BLVR (Chen et al, 2006) is recently developed. It solves the optimization problem with Markov Chain Monte Carlo (MCMC) (Gamerman, 1997). This is a more practical Bayesian modeling method which is able to handle high dimensional data sets. BLVR usually assumes all variables to be stochastic. Since this assumption may not be valid for some discrete variables, especially those from designed experiments, a BLVR modeling procedure is also developed for hybrid data sets which contain both continuous and discrete variables. This modeling procedure can model the continuous and discrete variables with respective appropriate assumptions. With these advancements, Bayesian modeling methods are much easier to be applied to practical problems.
Nevertheless, applying traditional methods is still more convenient than applying Bayesian methods and the question remains: when should Bayesian methods be used? That is, when is the extra effort of developing a Bayesian model instead of a conventional model be worth the extra effort? This presentation will demonstrate via theoretical and empirical arguments that if the amount of data available for modeling is small, then Bayesian modeling can perform better. This makes intuitive sense because with the incorporation of prior information, even with small amount of data, BLVR can still get good modeling results. In contrast, traditional methods often fail in this situation. When there are large amount of data available, the effect of prior information will be smaller. The signal to noise ration also has an effect on the performance on BLVR. When the signal to noise ratio of output variable is much smaller than signal to noise ratio of input variable, BLVR has much better performance than traditional methods.
Another challenge in applying Bayesian modeling methods is in obtaining information about the prior and likelihood distributions. Such information often has to be obtained from experts, who may communicate it in a non-probabilistic manner. For example, the range of variation or values of first and second moments may be known a priori. Maximum Entropy (ME) (Jaynes, 1968) methods have been developed in other areas for the elicitation of prior distribution. These approaches can be adapted for getting prior and likelihood distributions for BLVR based on available information. These distributions can also be obtained via an empirical approach called empirical Bayes (Carlin and Louis, 2000). In this approach, even without extra information other than current data set itself, parameters for the prior and likelihood distributions can still be estimated. Noninformative prior can also be used, such as the well known Jeffreys prior (Jeffreys, 1961). With the help of those techniques, prior and likelihood distributions can be elicited in a rigorous manner.
This presentation will discuss a variety of practical case studies on Bayesian modeling including a simulated high dimensional data set, an industrial high throughput screening data set, and other modeling tasks based on laboratory data. Illustrative examples of elicitation of prior distribution will also be presented.
References:
Carlin B.P. and Louis T.A. (2000), Bayes and Empirical Bayes Methods for Data Analysis, Chapman & Hall/CRC
Chen H., Bakshi B.R. and Goel P.K. (2006), Sampling-based Bayesian Latent Variable Regression, Chemical Process Control 7, Alberta, Canada
Gamerman D. (1997), Markov Chain Monte Carlo, Chapman & Hall.
Jaynes, E.T. (1968), Prior Probabilities, IEEE Transactions On Systems Science and Cybernetics, 4(3): 227-241
Jeffreys, H. (1961), Theory of Probability, Oxford University Press
Nounou M.N., Bakshi B.R., Goel P.K. and Shen X. (2002), Process Modeling By Bayesian Latent Variable Regression, AICHE Journal, 48(8):1775-1793