Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals

arXiv:2601.16091v12.31 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of clustering with delays in stochastic settings, offering a theoretical improvement over worst-case models, but it is incremental as it builds on existing online clustering frameworks.

The paper tackles online non-centroid clustering with delayed assignments in a stochastic arrival model, where points arrive from an unknown distribution, and shows that an algorithm achieves a constant competitive ratio for minimizing distance and delay costs as the number of points grows.

Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.

View on arXiv PDF

Similar