MLLGMar 24, 2019

A Strongly Consistent Sparse $k$-means Clustering with Direct $l_1$ Penalization on Variable Weights

arXiv:1903.10039v15 citations
Originality Incremental advance
AI Analysis

This provides a non-parametric method for feature selection in clustering, addressing the challenge of high-dimensional data where features outnumber observations, though it appears incremental as it builds on existing k-means and lasso techniques.

The authors tackled the problem of sparse clustering in high-dimensional data by proposing the LW-k-means algorithm, which directly applies a lasso penalty to feature weights for feature selection, and they demonstrated its strong consistency and competitive performance in accuracy and computational time against state-of-the-art methods.

We propose the Lasso Weighted $k$-means ($LW$-$k$-means) algorithm as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ($p$) can be much larger compared to the number of observations ($n$). In the $LW$-$k$-means algorithm, we introduce a lasso-based penalty term, directly on the feature weights to incorporate feature selection in the framework of sparse clustering. $LW$-$k$-means does not make any distributional assumption of the given dataset and thus, induces a non-parametric method for feature selection. We also analytically investigate the convergence of the underlying optimization procedure in $LW$-$k$-means and establish the strong consistency of our algorithm. $LW$-$k$-means is tested on several real-life and synthetic datasets and through detailed experimental analysis, we find that the performance of the method is highly competitive against some state-of-the-art procedures for clustering and feature selection, not only in terms of clustering accuracy but also with respect to computational time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes