LGFeb 5

How to Achieve the Intended Aim of Deep Clustering Now, without Deep Learning

arXiv:2602.05749v1h-index: 5

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of improving clustering performance for researchers and practitioners by showing that deep learning may not be necessary to overcome key limitations, indicating an incremental contribution.

The paper investigates whether deep clustering methods, specifically Deep Embedded Clustering (DEC), overcome the fundamental limitations of k-means clustering, such as handling arbitrary shapes, sizes, and densities, and finds that a non-deep learning approach using distributional information achieves this aim effectively.

Deep clustering (DC) is often quoted to have a key advantage over $k$-means clustering. Yet, this advantage is often demonstrated using image datasets only, and it is unclear whether it addresses the fundamental limitations of $k$-means clustering. Deep Embedded Clustering (DEC) learns a latent representation via an autoencoder and performs clustering based on a $k$-means-like procedure, while the optimization is conducted in an end-to-end manner. This paper investigates whether the deep-learned representation has enabled DEC to overcome the known fundamental limitations of $k$-means clustering, i.e., its inability to discover clusters of arbitrary shapes, varied sizes and densities. Our investigations on DEC have a wider implication on deep clustering methods in general. Notably, none of these methods exploit the underlying data distribution. We uncover that a non-deep learning approach achieves the intended aim of deep clustering by making use of distributional information of clusters in a dataset to effectively address these fundamental limitations.

View on arXiv PDF

Similar