MLSTNov 8, 2017

Dimension Estimation Using Random Connection Models

arXiv:1711.02876v17 citations
Originality Incremental advance
AI Analysis

This provides a more efficient and accessible dimension estimation tool for data scientists and researchers in machine learning and statistics, though it is incremental as it builds on existing random connection models.

The paper tackles the problem of estimating the intrinsic dimension of datasets without requiring explicit distance information, using a method based on binary neighbourhood adjacency matrices and a random connection model, achieving computational efficiency with n log n scaling and favorable performance compared to existing methods in simulations.

Information about intrinsic dimension is crucial to perform dimensionality reduction, compress information, design efficient algorithms, and do statistical adaptation. In this paper we propose an estimator for the intrinsic dimension of a data set. The estimator is based on binary neighbourhood information about the observations in the form of two adjacency matrices, and does not require any explicit distance information. The underlying graph is modelled according to a subset of a specific random connection model, sometimes referred to as the Poisson blob model. Computationally the estimator scales like n log n, and we specify its asymptotic distribution and rate of convergence. A simulation study on both real and simulated data shows that our approach compares favourably with some competing methods from the literature, including approaches that rely on distance information.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes