MLLGFeb 20, 2020

Simple and Scalable Sparse k-means Clustering via Feature Ranking

arXiv:2002.08541v22 citations
AI Analysis

This work addresses the challenge of clustering in high-dimensional data for fields like bioinformatics, offering a more efficient alternative to existing sparse clustering techniques, though it appears incremental as it builds on prior k-means and sparse clustering frameworks.

The paper tackles the problem of high-dimensional clustering by proposing a simple and scalable sparse k-means method that reduces computational complexity and eliminates the need for tuning shrinkage parameters, achieving competitive performance with state-of-the-art algorithms as demonstrated in simulated and real-world benchmarks.

Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters. This has motivated the development of sparse clustering techniques that typically rely on k-means within outer algorithms of high computational complexity. Current techniques also require careful tuning of shrinkage parameters, further limiting their scalability. In this paper, we propose a novel framework for sparse k-means clustering that is intuitive, simple to implement, and competitive with state-of-the-art algorithms. We show that our algorithm enjoys consistency and convergence guarantees. Our core method readily generalizes to several task-specific algorithms such as clustering on subsets of attributes and in partially observed data settings. We showcase these contributions thoroughly via simulated experiments and real data benchmarks, including a case study on protein expression in trisomic mice.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes