ML LG OCAug 16, 2021

Robust Trimmed k-means

Olga Dorabiala, J. Nathan Kutz, Aleksandr Aravkin

arXiv:2108.07186v15.019 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for robust clustering algorithms in data analytics, particularly for real-world data with noise and outliers, though it appears incremental as an extension of k-means.

The authors tackled the problem of robust clustering in the presence of outliers and unclear group separations by proposing Robust Trimmed k-means (RTKM), which simultaneously identifies outliers and clusters points for single- or multi-membership data, showing competitive performance on datasets without outliers and outperforming other methods on multi-membership data with outliers.

Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing between similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robustify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single- or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers. We also show that RTKM leverages its relative advantages to outperform other methods on multi-membership data containing outliers.

View on arXiv PDF Code

Similar