2019 AIChE Annual Meeting
(344d) Nonlinear Dynamic Feature Extraction Based on Gaussian Process Dynamical Models for Jit-Based Adaptive Soft Sensors
As industrial process plants, it is important to monitor product quality to assure both product quality and process safety. Soft sensors are models constructed between difficult-to-measure variables (y-variables) and easy-to-measure variables (X-variables). Values estimated with soft sensors are used to control plants rapidly. Partial least squares (PLS) [1] are the most popular linear regression methods. To handle nonlinear cases, support vector regression [2] and Gaussian process regression (GPR) [3] have been developed. Since the GPR method can generate a probabilistic model, an uncertainty estimation for the variable of interest can be obtained. However, the predictive ability of such techniques decreases due to process changes in chemical plants. To reduce the decrease of predictive ability, adaptive soft sensors have been developed [4]. We focus on just-in-time (JIT) soft sensors. The JIT model can be constructed online and can track process changes well. Locally weighted PLS (LWPLS) [5] is one of the nonlinear JIT methods. Since a set of hyperparameters in any JIT model has to be set beforehand and the hyperparameters of a JIT model will not always be the optimal ones for a query sample, the predictive ability of a JIT model decreases.
When constructing a soft sensor model for time series data, process dynamics are often considered to improve estimation performance [6]. In other words, not only the current time samples but also past samples are added to the X-variables to construct a model. On the other hand, if all these variables are used to construct the soft sensor model, it will increase the computation burden and make the model overfitting. Therefore, it is necessary to carry out dimensionality reduction and feature extraction before constructing the model. Principal component analysis (PCA) [7], as a traditional dimensionality reduction method, has been identified as an effective approach to transform observed variables to low-dimensional data. PCA is a linear method and developed in a deterministic manner, which lacks a probabilistic interpretation. As a matter of fact, process data are usually measured in a noisy environment and disturbances may happen. Also, there are nonlinearity between X-variables and temporal dependences between adjacent latent states and the data are autocorrelated. As an alternative, Gaussian process dynamical models (GPDM) [8] that is a nonlinear probabilistic dynamic latent variable model has been developed to model the process features. In GPDM, the latent states are assumed to be produced according to a first-order Markov chain, and each observation is generated from its current state by a nonlinear transformation. In this study, we propose to combine JIT and ensemble learning, and predict y-values with multiple JIT models, which are constructed using latent variables from GPDM and sets of hyperparameters are different. The proposed method is called ensemble just in time based on Gaussian process dynamical models (EJITGPDM). The weights of JIT models are determined based on Bayesâ theorem, considering their predictive ability. We check that the proposed model has higher predictive accuracy than traditional models through two industrial data analyses.
Proposed method
In the proposed method EJITGPDM, y-values are estimated from X-values by weighting the y-values predicted by multiple JIT models constructed using different hyperparameters. When a query sample xq comes, the Euclidean distance between each historical sample and the query sample is first calculated. Next, get the indexes of the N most relevant samples, which have a small Euclidean distance to the query sample. After extracting latent variables from GPDM, a GPR model is constructed between the latent variables of the sample with high similarity and y. In GPDM, it is necessary to determine the number of components, and in JIT model, the number of data for model construction. That is, when the number of GPDM components differs in m-values and the number of data for model construction differs in n-values, the number of JIT models is L (= m × n). The y-values for query sample are predicted with multiple JIT models, after that they are integrated with ensemble learning. Each weight of the predicted y-values are calculated with Bayesian theorem [9], taking into account the prediction ability of each JIT model. When S is the current (unobserved) state in a plant and Mi is the ith model, the probability of Mi given S, P(Mi|S), is required to combine the prediction results of L JIT models. Using RMSE for the mid-points between the k-nearest-neighbor data points for the s data, which is called RMSEmidknn [10], the predictive accuracy of nonlinear regression models can be evaluated. Therefore, using weight, that is, inverse of RMSEmidknn, y-values can be estimated while monitoring the predictive ability of each JIT model and weighting each model appropriately.
Results and Discussion
To verify the effectiveness of the proposed method, we analyzed two real industrial process datasets. The dynamic just in time based on Gaussian process regression model (DJIT-GPR) [11] and the dynamic LWPLS model (DLWPLS), the dynamic locally weighted principal component regression model (DLWPCR) [12] used for comparison.
Real industrial data in a debutanizer column [13]
We applied the proposed method to data obtained from the operation of a debutanizer column. The y-variable is the butane content in the bottom flow and the X-variables are top temperature, top pressure, reflux flow, flow to next process, sixth tray temperature and bottom temperature. We could use the dataset where the measurement intervals of the y-variable and the X-variables were 6 min and the measurement delay of the y-variables was 48 min. Data from 100 to 894 were used as training data and data from 895 to 1394 were used as validation data, and data from 1395 to 2394 were used as test data. Thus, we had 795 training data and 500 validation data, and 1394 test data.
The lowest r2 values was 0.401 whose DLWPCR model produced. Using the DJIT-GPR model and the DLWPLS model, prediction accuracy improved and their values were 0.840 and 0.856. Between DJIT-GPR, DLWPLS, DLWPCR, and EJITGPDM, the highest r2 was 0.898 whose EJITGPDM produced. The proposed model could predict y-values more accurately than the traditional models. The y-values predicted with the DLWPLS model have larger variation than those of the proposed EJITGPDM model, that is, the proposed EJITGPDM achieves stable and accurate prediction of y-values.
Real industrial data in a sulfur recovery unit (SRU) [14]
We applied the proposed method to data obtained from the operation of an SRU. The y-variable is the concentration of H2S in the tail gas of Line 4 and the X-variables are gas flow MEA GAS, air flow AIR MEA, secondary air flow AIR MEA 2, gas flow in SWS zone, air flow in SWS zone. In industrial process plants, there are a case where the measurement interval of y-variables is larger than the measurement interval of X-variables. We verified the effectiveness of the proposed method in the case where the measurement interval of y-variables is one sixth of the measurement interval of X-variables. Data from 100 to 4101 were used as training data and data from 4102 to 6099 were used as validation data, and data from 6070 to 3982 were used as test data. Thus, we had 667 training data and 333 validation data, and 3982 test data.
As was the case in the debutanizer column data, EJITGPDM produced the r2 value was the highest and its value was 0.693. It is more than 10% better than the value of DLWPLS. In time plots of actual y and estimated y for DLWPLS and EJITGPDM, The y-values, which were the unmeasured y-value part, have smaller variation than those of the comparison models.
Conclusions
We proposed the modeling method EJITGPDM, which takes into consideration the nonlinearity between X-variables and temporal dependences between adjacent latent states. Also, it is always possible to calculate predicted y-values of query samples using optimal hyperparameters. In order to verify the effectiveness of the proposed method, a case study was conducted using operation data measured by the debutanizer column, and operation data measured by the SRU. The simulation results show the effectiveness of the proposed method. Efficient and stable management and operation of chemical plants are expected by using the proposed method.
Reference
- Joe Qin, Comput. Chem. Eng. 22, 503-514, 1998.
- Yan, H. Shao, X. Wang, Comput. Chem. Eng., 28, 1489-1498, 2004.
- B. Belhouari , A. Bermak, Comput. Stat. Data Anal., 47, 705â712, 2004.
- Kaneko, K. Funatsu, AIChE J., 52, 1322-1334, 2013.
- Kim, M. Kano, H. Nakagawa, S. Hasebe, Int J Pharm. 421, 269-274, 2011.
- H. Kaspar, W. H. Ray, Chem. Eng. Sci, 48, 3447-3461, 1993.
- Hotelling, Journal of educational psychology, 24, 417, 1933.
- Wang, D. Fleet, A. Hertzmann, NIPS, 18, 1441-1448, 2005.
- Khatibisepehr, B. Huang, S. Khare, J. Process. Control., 23, 1575â1596, 2013.
- Kaneko, K. Funatsu, J. Chem. Inf. Model., 53, 2341â2348, 2013.
- L. T. Chan, X. Wu, J. Chen, L. Xie, C. I. Chen, IEEE Trans. Semicond. Manuf., 31, 3, 2018
- F. Yuan, Z. Q. Ge, Z. H. Song, Ind Eng Chem Res, 53, 13736-13749, 2014.
- http://www.springer.com/us/book/9781846284793
- http://www.springer.com/us/book/9781846284793