2006 AIChE Annual Meeting
(679a) A Novel Optimization-Based Clustering Approach and Prediction of Optimal Number of Clusters: Global Optimum Search with Enhanced Positioning (EP_GOS_Clust)
Authors
In this presentation, a novel clustering algorithm framework is introduced [1]. It is based on a variant of the Generalized Benders Decomposition, denoted as the Global Optimum Search [2, 3], which includes a procedure to determine the optimal number of clusters to be used. As an investigative study, the proposed algorithm is applied to experimental DNA microarray data centered on the Ras signaling pathway in the yeast Saccharomyces Cerevisiae. The clustering results are compared to that obtained with existing popular clustering algorithms. The proposed approach outperforms these algorithms in both the areas of intra-cluster similarity and inter-cluster dissimilarity, often considered as the two key tenets of clustering. The proposed algorithm's implementation is also structured to expedite the solution for the determination of the optimal number of clusters.
In laying the groundwork for the development of the EP_GOS_Clust, we also studied the effects by differing normalization methods and pre-clustering techniques on clustering quality [4]. The aim of the latter is to use just an adequate amount of discriminatory characteristics to form rough information profiles so that data points with similar features can be pre-grouped together and outliers deemed not to be significant to the clustering process can be removed. With respect to the clustering of DNA microarray data, we compare the merits of normalizing expression data across genes as opposed to over each experiment. We also study the effects different pre-clustering approaches have on clustering quality. Specifically, we look at the pre-clustering of genes based on both actual expression data and symbolic representations such as {+, o, -}. In our assessment, we look again at the intra- and inter-cluster error sums. We also use publicly available Gene Ontology resources to determine the pre-clustering method that results in clusters with the highest level of biological coherence.
[1]-Tan, M. P.; Broach, J. R.; Floudas, C. A.; A Novel Clustering Approach and Prediction of Optimal Number of Clusters: Global Optimum Search with Enhanced Positioning (EP_GOS_Clust); 2006; In Preparation
[2]-Floudas, C. A.; Nonlinear and Mixed-Integer Optimization: Fundamentals and Applications; Oxford University Press; 1995
[3]-Floudas, C. A.; Aggarwal, A.; Ciric, A. R.; Global Optimum Search for Non Convex NLP and MINLP Problems; Comp. & Chem. Eng.; 13(10); 1989; pp. 1117-1132
[4]-Tan, M. P.; Broach, J. R.; Floudas, C. A.; Evaluation of Normalization and Pre-Clustering Issues on a Novel Mixed-Integer Nonlinear Optimization-Based Clustering Approach; 2006; In Preparation