LG OCSep 3, 2024

Robust Clustering on High-Dimensional Data with Stochastic Quantization

arXiv:2409.02066v53 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses memory and convergence issues in clustering for large-scale, high-dimensional datasets, though it appears incremental with modifications like adaptive learning rates.

The paper tackles the memory inefficiency and lack of theoretical guarantees in traditional clustering algorithms like K-Means by proposing Stochastic Quantization (SQ) as a scalable alternative for high-dimensional data, demonstrating its computational efficiency and rapid convergence on an image classification task with partially labeled data.

This paper addresses the limitations of conventional vector quantization algorithms, particularly K-Means and its variant K-Means++, and investigates the Stochastic Quantization (SQ) algorithm as a scalable alternative for high-dimensional unsupervised and semi-supervised learning tasks. Traditional clustering algorithms often suffer from inefficient memory utilization during computation, necessitating the loading of all data samples into memory, which becomes impractical for large-scale datasets. While variants such as Mini-Batch K-Means partially mitigate this issue by reducing memory usage, they lack robust theoretical convergence guarantees due to the non-convex nature of clustering problems. In contrast, the Stochastic Quantization algorithm provides strong theoretical convergence guarantees, making it a robust alternative for clustering tasks. We demonstrate the computational efficiency and rapid convergence of the algorithm on an image classification problem with partially labeled data, comparing model accuracy across various ratios of labeled to unlabeled data. To address the challenge of high dimensionality, we employ a Triplet Network to encode images into low-dimensional representations in a latent space, which serve as a basis for comparing the efficiency of both the Stochastic Quantization algorithm and traditional quantization algorithms. Furthermore, we enhance the algorithm's convergence speed by introducing modifications with an adaptive learning rate.

View on arXiv PDF

Similar