DS CG LGOct 12, 2021

Finding Relevant Points for Nearest-Neighbor Classification

arXiv:2110.06163v13.34 citations

Originality Synthesis-oriented

AI Analysis

This work addresses efficiency in nearest-neighbor classification for machine learning practitioners, but it is incremental as it builds on existing algorithms.

The paper tackles the problem of identifying relevant training points in nearest-neighbor classification, where omission changes classification outcomes, by providing a simple algorithm that improves time bounds over a prior method in constant dimensions d≥3.

In nearest-neighbor classification problems, a set of $d$-dimensional training points are given, each with a known classification, and are used to infer unknown classifications of other points by using the same classification as the nearest training point. A training point is relevant if its omission from the training set would change the outcome of some of these inferences. We provide a simple algorithm for thinning a training set down to its subset of relevant points, using as subroutines algorithms for finding the minimum spanning tree of a set of points and for finding the extreme points (convex hull vertices) of a set of points. The time bounds for our algorithm, in any constant dimension $d\ge 3$, improve on a previous algorithm for the same problem by Clarkson (FOCS 1994).

View on arXiv PDF

Similar