LGFeb 13, 2024

A Novel Approach to Regularising 1NN classifier for Improved Generalization

arXiv:2402.08405v12 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the generalization issue in nearest neighbor classifiers for machine learning practitioners, though it appears incremental as it builds on existing regularization approaches.

The paper tackles the problem of overfitting in 1NN classifiers by introducing Watershed Classifiers, a novel regularization method that learns arbitrary boundaries and reduces VC dimension, leading to improved generalization with demonstrated performance gains over the NCA baseline.

In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider $K$ nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with ($n-1$) nearest neighbour classifier, where $n$ denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes