LGMLSep 26, 2013

Determinantal Clustering Processes - A Nonparametric Bayesian Approach to Kernel Based Semi-Supervised Clustering

arXiv:1309.6862v120 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of clustering high-dimensional data with limited labels for researchers in machine learning, offering a novel approach but is incremental as it builds on existing kernel and Bayesian methods.

The authors tackled the problem of semi-supervised clustering with unknown cluster numbers by introducing a nonparametric Bayesian kernel-based method that uses determinants of kernel submatrices to measure point closeness, achieving competitive performance on synthetic and real-world datasets without requiring pre-specified cluster counts or complex density modeling.

Semi-supervised clustering is the task of clustering data points into clusters where only a fraction of the points are labelled. The true number of clusters in the data is often unknown and most models require this parameter as an input. Dirichlet process mixture models are appealing as they can infer the number of clusters from the data. However, these models do not deal with high dimensional data well and can encounter difficulties in inference. We present a novel nonparameteric Bayesian kernel based method to cluster data points without the need to prespecify the number of clusters or to model complicated densities from which data points are assumed to be generated from. The key insight is to use determinants of submatrices of a kernel matrix as a measure of how close together a set of points are. We explore some theoretical properties of the model and derive a natural Gibbs based algorithm with MCMC hyperparameter learning. The model is implemented on a variety of synthetic and real world data sets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes