LGMar 11

A Universal Nearest-Neighbor Estimator for Intrinsic Dimensionality

Eng-Jon Ong, Omer Bobrowski, Gesine Reinert, Primoz Skraba

arXiv:2603.10493v16.1h-index: 31

Predicted impact top 58% in LG · last 90 daysOriginality Highly original

AI Analysis

This addresses a fundamental issue in machine learning and computer vision for researchers and practitioners, offering a robust tool for understanding data structure.

The paper tackles the problem of estimating intrinsic dimensionality (ID) of data by introducing a nearest-neighbor distance ratio estimator that achieves state-of-the-art results, with theoretical proof of universality across data distributions.

Estimating the intrinsic dimensionality (ID) of data is a fundamental problem in machine learning and computer vision, providing insight into the true degrees of freedom underlying high-dimensional observations. Existing methods often rely on geometric or distributional assumptions and can significantly fail when these assumptions are violated. In this paper, we introduce a novel ID estimator based on nearest-neighbor distance ratios that involves simple calculations and achieves state-of-the-art results. Most importantly, we provide a theoretical analysis proving that our estimator is \emph{universal}, namely, it converges to the true ID independently of the distribution generating the data. We present experimental results on benchmark manifolds and real-world datasets to demonstrate the performance of our estimator.

View on arXiv PDF

Similar