2015 AIChE Spring Meeting and 11th Global Congress on Process Safety

(204b) Inferring System's Structure in Large Scale Data-Driven Modelling

The analysis of data sets collected from large scale industrial processes is a challenging task even for experienced data analysts. Not only data naturally inherits the complex features arising from the multiple phenomena going on, such as a megavariate correlation structure, multiscale dynamics, delayed structure between stages and non-stationarity, but in practice other features show up as well, which make almost each case study under analysis a new challenge. In fact, features such as multiresolution information [1], multirate acquisition systems, heteroscedastic noise, outliers, missing data, etc., are quite recurrent and require tailor made solutions.

At present, most of the above mentioned features can be addressed in an isolated way, and methodologies have also been developed that allow for dealing with some of them in an integrated way. As megavariate structure is a prevalent feature of current industrial data, any such an analysis framework necessarily contemplates a modelling stage where it should be described in an effective way. Latent variable (LV) models [2-6] have been instrumental for handling this task when analyzing data arising from normal operation conditions – the so called observational or happenstance data. This class of models has been playing a central role in the analysis of industrial data since its introduction approximately 25 years ago and many upgrades have been made for handling additional features, such as single-scale dynamics [4], multiscale-dynamics [7], batch processes [6], among others.

The underlying backbone of latent variable models remained essentially unchanged during all these years of intensive use and incremental evolution. The non-causal model structure in the original variables space described as a result of unobservable latent variables continues to be employed in every application of these methods. But it is widely known now that processes present a network structure with characteristic organizational features of clustering, specialization and hierarchy. These features are passed on to collected data, which more than just presenting a multi- or megavariate correlation structure, have also these fundamental structural variability components that must be properly addressed and described.

In this presentation we refer some methodologies developed recently to extract in a larger extent the inner structure of correlations between variables in real world data sets and handle it in a consistent way, beyond what LV approaches are able to provide in their usual formulation. By providing a better match with the underlying systems structure, these methods potentiate the development of improved monitoring and predictive frameworks for modern industrial processes.

References

[1] M.S. Reis, P.M. Saraiva, AIChE J., 52:6 (2006) 2107-2119.

[2] J.V. Kresta, J.F. MacGregor, T.E. Marlin, Can. J. Chem. Eng., 69 (1991) 35-47.

[3] J.F. MacGregor, T. Kourti, Control Eng. Pract., 3:3 (1995) 403-414.

[4] W. Ku, R.H. Storer, C. Georgakis, Chemom. Intell. Lab. Syst., 30 (1995) 179-196.

[5] J.F. MacGregor, C. Jaeckle, C. Kiparissides, M. Koutoudi, AIChE J., 40:5 (1994)

[6] P. Nomikos, J.F. MacGregor, AIChE J., 40:8 (1994) 1361-1375.

[7] B.R. Bakshi, AIChE J., 44:7 (1998) 1596-1610.