Greedy k-Center from Noisy Distance Samples
This work addresses a practical problem in clustering and facility location for scenarios where distance data is noisy or incomplete, offering an incremental algorithmic improvement.
The paper tackles the k-center problem with unknown distances by using noisy or incomplete distance queries, proposing active algorithms based on Multi-Armed Bandit techniques to achieve a 2-approximation ratio with high probability. It demonstrates significant improvements over naive methods on real-world datasets like Tiny ImageNet and UT Zappos50K.
We study a variant of the canonical k-center problem over a set of vertices in a metric space, where the underlying distances are apriori unknown. Instead, we can query an oracle which provides noisy/incomplete estimates of the distance between any pair of vertices. We consider two oracle models: Dimension Sampling where each query to the oracle returns the distance between a pair of points in one dimension; and Noisy Distance Sampling where the oracle returns the true distance corrupted by noise. We propose active algorithms, based on ideas such as UCB, Thompson Sampling and Track-and-Stop developed in the closely related Multi-Armed Bandit problem, which adaptively decide which queries to send to the oracle and are able to solve the k-center problem within an approximation ratio of two with high probability. We analytically characterize instance-dependent query complexity of our algorithms and also demonstrate significant improvements over naive implementations via numerical evaluations on two real-world datasets (Tiny ImageNet and UT Zappos50K).