Doubly Stochastic Mean-Shift Clustering
This addresses clustering fragmentation issues for users in sparse data scenarios, representing an incremental improvement over existing methods.
The paper tackled the sensitivity of Mean-Shift algorithms to bandwidth hyperparameters in data-scarce regimes, resulting in DSMS, which outperforms baselines by preventing over-segmentation with improved stability.
Standard Mean-Shift algorithms are notoriously sensitive to the bandwidth hyperparameter, particularly in data-scarce regimes where fixed-scale density estimation leads to fragmentation and spurious modes. In this paper, we propose Doubly Stochastic Mean-Shift (DSMS), a novel extension that introduces randomness not only in the trajectory updates but also in the kernel bandwidth itself. By drawing both the data samples and the radius from a continuous uniform distribution at each iteration, DSMS effectively performs a better exploration of the density landscape. We show that this randomized bandwidth policy acts as an implicit regularization mechanism, and provide convergence theoretical results. Comparative experiments on synthetic Gaussian mixtures reveal that DSMS significantly outperforms standard and stochastic Mean-Shift baselines, exhibiting remarkable stability and preventing over-segmentation in sparse clustering scenarios without other performance degradation.