2017 Spring Meeting and 13th Global Congress on Process Safety
(127c) Knowledge Discovery and Explanation from Industrial Process Data Using Clustering and Subspace Search
Authors
In this study, Density-based spatial clustering of applications with noise (DBSCAN) and k-means clustering are introduced for process behavior extraction from historical database. Both clustering techniques are studied on industrial data of a pyrolysis reactor data and simulation data. Their performances are evaluated by three cluster evaluation metrics (homogeneity, completeness and DaviesâBouldin index).
Beyond the process behavior extraction using data clustering techniques, we propose a subspace searching based approach to explain the disparity between pair-wise process clusters in terms of the most contributing attributes. In other words, the most contributing attributes are used to explain the disparity between certain process clusters (comparative group) with its reference clusters (reference group). Each data sample in comparative group is compared with the reference group by its dimensional normalized k-distance in each subspaces (or called âdimensionsâ). The subspace with highest dimensional normalized k-distance is treated as the explanation of the disparity. Nevertheless, the brute force searching is computational infeasible due to its computational complexity. Thus, sample condensation and greedy searching are used to handle the computational complexity in our study.
The results illustrate that both DBSCAN and k-means clustering performs well on classification of process behaviors. Various process modes and process faults are recognized by such clustering techniques. Furthermore, pair-wise explanation of disparate process clusters seems reasonable by reviewing the variation of attributes in the explanatory subspace. The utilization of sample condensation and greedy searching optimizes the computational complexity, which enables such approach both suitable for online fault identification and offline data analysis.