LGNov 23, 2014

Balanced k-Means and Min-Cut Clustering

arXiv:1411.6235v126 citations
Originality Incremental advance
AI Analysis

This work addresses the need for balanced clustering in data mining, offering an incremental improvement over existing methods.

The paper tackled the problem of existing clustering algorithms tending to cluster minority data points into subsets, which is undesirable for balanced datasets, by proposing to leverage exclusive lasso on k-means and min-cut to regulate balance, resulting in more accurate clustering validated on large-scale datasets.

Clustering is an effective technique in data mining to generate groups that are the matter of interest. Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes