LGOct 26, 2014

Notes on using Determinantal Point Processes for Clustering with Applications to Text Clustering

arXiv:1410.6975v15 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of cluster count selection in text clustering, but it is incremental as it builds on existing methods without major breakthroughs.

The paper compares three initialization schemes for K-means clustering, finding that KMEANSD++ (using determinantal point processes) performs comparably to KMEANS++ and better than random initialization, with the added benefit of automatically approximating the number of clusters k.

In this paper, we compare three initialization schemes for the KMEANS clustering algorithm: 1) random initialization (KMEANSRAND), 2) KMEANS++, and 3) KMEANSD++. Both KMEANSRAND and KMEANS++ have a major that the value of k needs to be set by the user of the algorithms. (Kang 2013) recently proposed a novel use of determinantal point processes for sampling the initial centroids for the KMEANS algorithm (we call it KMEANSD++). They, however, do not provide any evaluation establishing that KMEANSD++ is better than other algorithms. In this paper, we show that the performance of KMEANSD++ is comparable to KMEANS++ (both of which are better than KMEANSRAND) with KMEANSD++ having an additional that it can automatically approximate the value of k.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes