MLLGJul 6, 2024

On high-dimensional modifications of the nearest neighbor classifier

arXiv:2407.05145v3h-index: 2
Originality Synthesis-oriented
AI Analysis

This work addresses a specific issue in nonparametric classification for high-dimensional data, offering incremental improvements to existing methods.

The paper tackles the poor performance of the nearest neighbor classifier in high-dimensional, low-sample-size settings, where scale differences between classes dominate location differences, by proposing new modifications and showing empirical improvements over existing methods through simulations and benchmark datasets.

Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes