LG DS IT MLJun 11, 2019

Ultra Fast Medoid Identification via Correlated Sequential Halving

arXiv:1906.04356v210.323 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the computational bottleneck in medoid identification for applications like clustering and data analysis, representing a strong incremental improvement over prior adaptive methods.

The paper tackles the problem of efficiently identifying the medoid of a set of points by reducing distance computations, achieving four to five orders of magnitude gains in both computations and wall clock time compared to exact methods on real data.

The medoid of a set of n points is the point in the set that minimizes the sum of distances to other points. It can be determined exactly in O(n^2) time by computing the distances between all pairs of points. Previous works show that one can significantly reduce the number of distance computations needed by adaptively querying distances. The resulting randomized algorithm is obtained by a direct conversion of the computation problem to a multi-armed bandit statistical inference problem. In this work, we show that we can better exploit the structure of the underlying computation problem by modifying the traditional bandit sampling strategy and using it in conjunction with a suitably chosen multi-armed bandit algorithm. Four to five orders of magnitude gains over exact computation are obtained on real data, in terms of both number of distance computations needed and wall clock time. Theoretical results are obtained to quantify such gains in terms of data parameters. Our code is publicly available online at https://github.com/TavorB/Correlated-Sequential-Halving.

View on arXiv PDF Code

Similar