2022 Annual Meeting
(10h) Global Optimal K-Center Clustering with Millions of Samples
This work provides a practical global optimization algorithm for this task based on a reduced-space spatial branch and bound scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the datasetâs cardinality. In addition, a set of feasibility-based bounds tightening techniques are proposed to determine the assignment of samples, narrow down the domain of centers, and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on 26 UCI datasets [5]. Notably, for the dataset with 4 million samples ((i.e., 1000 times larger than the state-of-the-art method [6] in the literature) and 18 features, the serial implementation of the algorithm can attain the global optimum to an optimality gap of 0.1% within 2 hours.
[1] Aggarwal, C. C., Wolf, J. L., and Yu, P. S.-l. Method for targeted advertising on the web based on accumulated self-learning data, clustering users and semantic node graph techniques, March 30 2004. US Patent 6,714,975.
[2] Kleindessner, M., Awasthi, P., and Morgenstern, J. Fair k-center clustering for data summarization. In International Conference on Machine Learning, pp. 3448â3457. PMLR, 2019.
[3] Hansen, P., Brimberg, J., UroËsevià Lc, D., and Mladenovià Lc, N. Solving large p-median clustering problems by primalâdual variable neighborhood search. Data Mining and Knowledge Discovery, 19(3):351â375, 2009.
[4] Kaufman, L. and Rousseeuw, P. J. Finding groups in data: an introduction to cluster analysis , volume 344. JohnWiley & Sons, 2009.
[5] Dua, D. and Graff, C. UCI machine learning repository, 2017. URL http://archive.ics.uci.edu/ml.
[6] Duong, K.-C., Vrain, C., et al. Constrained clustering by constraint programming. Artificial Intelligence , 244:70â94, 2017.