LGFeb 13, 2024

A Novel Approach to Regularising 1NN classifier for Improved Generalization

Aditya Challa, Sravan Danda, Laurent Najman

arXiv:2402.08405v14.62 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the generalization issue in nearest neighbor classifiers for machine learning practitioners, though it appears incremental as it builds on existing regularization approaches.

The paper tackles the problem of overfitting in 1NN classifiers by introducing Watershed Classifiers, a novel regularization method that learns arbitrary boundaries and reduces VC dimension, leading to improved generalization with demonstrated performance gains over the NCA baseline.

In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider $K$ nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with ($n-1$) nearest neighbour classifier, where $n$ denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.

View on arXiv PDF

Similar