ML STFeb 1, 2016

A Quasi-Bayesian Perspective to Online Clustering

Le Li, Benjamin Guedj, Sébastien Loustau

arXiv:1602.00522v31 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of clustering in streaming data for applications requiring real-time analysis, though it appears incremental as it builds on existing quasi-Bayesian and RJMCMC methods.

The authors tackled the problem of online clustering for high-frequency data streams by introducing a quasi-Bayesian algorithm that dynamically estimates the number of clusters, supported by minimax regret bounds and numerical experiments.

When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.

View on arXiv PDF

Similar