LG MLJul 17, 2019

$t$-$k$-means: A Robust and Stable $k$-means Variant

Yiming Li, Yang Zhang, Qingtao Tang, Weipeng Huang, Yong Jiang, Shu-Tao Xia

arXiv:1907.07442v43.4Has Code

Originality Incremental advance

AI Analysis

This addresses the issue of unreliable clustering for users dealing with noisy or outlier-prone data, representing an incremental improvement over standard k-means.

The paper tackles the problem of poor performance and instability of the k-means algorithm on datasets with heavy-tailed data or outliers by proposing a robust and stable variant called t-k-means, achieving improved results as verified through extensive experiments.

$k$-means algorithm is one of the most classical clustering methods, which has been widely and successfully used in signal processing. However, due to the thin-tailed property of the Gaussian distribution, $k$-means algorithm suffers from relatively poor performance on the dataset containing heavy-tailed data or outliers. Besides, standard $k$-means algorithm also has relatively weak stability, $i.e.$ its results have a large variance, which reduces its credibility. In this paper, we propose a robust and stable $k$-means variant, dubbed the $t$-$k$-means, as well as its fast version to alleviate those problems. Theoretically, we derive the $t$-$k$-means and analyze its robustness and stability from the aspect of the loss function and the expression of the clustering center, respectively. Extensive experiments are also conducted, which verify the effectiveness and efficiency of the proposed method. The code for reproducing main results is available at \url{https://github.com/THUYimingLi/t-k-means}.

View on arXiv PDF Code

Similar