MLDSITLGNov 2, 2017

Medoids in almost linear time via multi-armed bandits

arXiv:1711.00817v336 citationsHas Code
Originality Highly original
AI Analysis

This work addresses a common bottleneck in data science for handling large-scale, high-dimensional data, offering a significant speedup for medoid computation.

The paper tackles the problem of efficiently computing the medoid for large, high-dimensional datasets by introducing Med-dit, an algorithm that uses O(n log n) distance evaluations to achieve this with high probability, resulting in a 5-10x performance improvement over state-of-the-art methods on real-world datasets.

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and the single-cell RNA-Seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a 5-10x improvement in performance over the current state of the art. Med-dit is available at https://github.com/bagavi/Meddit

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes