MLSTFeb 1, 2016

A Quasi-Bayesian Perspective to Online Clustering

arXiv:1602.00522v31 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of clustering in streaming data for applications requiring real-time analysis, though it appears incremental as it builds on existing quasi-Bayesian and RJMCMC methods.

The authors tackled the problem of online clustering for high-frequency data streams by introducing a quasi-Bayesian algorithm that dynamically estimates the number of clusters, supported by minimax regret bounds and numerical experiments.

When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes