LGDec 23, 2022

Using MM principles to deal with incomplete data in K-means clustering

arXiv:2212.12379v11.8h-index: 6Has Code

Originality Synthesis-oriented

AI Analysis

This addresses a specific issue in clustering for data analysis applications, but it appears incremental as it adapts an existing optimization technique to a known limitation.

The paper tackles the problem of incomplete data in K-means clustering by applying MM principles to restore data symmetry, enabling K-means to function effectively, with experimental verification on standard datasets.

Among many clustering algorithms, the K-means clustering algorithm is widely used because of its simple algorithm and fast convergence. However, this algorithm suffers from incomplete data, where some samples have missed some of their attributes. To solve this problem, we mainly apply MM principles to restore the symmetry of the data, so that K-means could work well. We give the pseudo-code of the algorithm and use the standard datasets for experimental verification. The source code for the experiments is publicly available in the following link: \url{https://github.com/AliBeikmohammadi/MM-Optimization/blob/main/mini-project/MM%20K-means.ipynb}.

View on arXiv PDF Code

Similar