ML LGJul 6, 2024

On high-dimensional modifications of the nearest neighbor classifier

Annesha Ghosh, Deep Ghoshal, Bilol Banerjee, Anil K. Ghosh

arXiv:2407.05145v3h-index: 2

Originality Synthesis-oriented

AI Analysis

This work addresses a specific issue in nonparametric classification for high-dimensional data, offering incremental improvements to existing methods.

The paper tackles the poor performance of the nearest neighbor classifier in high-dimensional, low-sample-size settings, where scale differences between classes dominate location differences, by proposing new modifications and showing empirical improvements over existing methods through simulations and benchmark datasets.

Nearest neighbor classifier is arguably the most simple and popular nonparametric classifier available in the literature. However, due to the concentration of pairwise distances and the violation of the neighborhood structure, this classifier often suffers in high-dimension, low-sample size (HDLSS) situations, especially when the scale difference between the competing classes dominates their location difference. Several attempts have been made in the literature to take care of this problem. In this article, we discuss some of these existing methods and propose some new ones. We carry out some theoretical investigations in this regard and analyze several simulated and benchmark datasets to compare the empirical performances of proposed methods with some of the existing ones.

View on arXiv PDF

Similar