LGJun 25, 2023

A Self-Encoder for Learning Nearest Neighbors

Armand Boschin, Thomas Bonald, Marc Jeanmougin

arXiv:2306.14257v12.02 citationsh-index: 35

Originality Incremental advance

AI Analysis

This provides a self-supervised method for improving nearest-neighbor classifiers or regressors, particularly for heterogeneous data, but it appears incremental as it builds on existing representation learning and nearest-neighbor techniques.

The paper tackles the problem of learning useful data representations for nearest-neighbor methods by introducing a self-encoder that distributes samples to be linearly separable, enabling invariant predictions without feature scaling. Experiments demonstrate efficiency on heterogeneous data mixing numerical and categorical features.

We present the self-encoder, a neural network trained to guess the identity of each data sample. Despite its simplicity, it learns a very useful representation of data, in a self-supervised way. Specifically, the self-encoder learns to distribute the data samples in the embedding space so that they are linearly separable from one another. This induces a geometry where two samples are close in the embedding space when they are not easy to differentiate. The self-encoder can then be combined with a nearest-neighbor classifier or regressor for any subsequent supervised task. Unlike regular nearest neighbors, the predictions resulting from this encoding of data are invariant to any scaling of features, making any preprocessing like min-max scaling not necessary. The experiments show the efficiency of the approach, especially on heterogeneous data mixing numerical features and categorical features.

View on arXiv PDF

Similar