MLLGCOMP-PHJul 20, 2022

Intrinsic dimension estimation for discrete metrics

arXiv:2207.09688v211 citationsh-index: 55
Originality Incremental advance
AI Analysis

This addresses the need for reliable intrinsic dimension estimation in discrete data, which is common in fields like genomics and network analysis, though it is incremental as it adapts existing concepts to discrete spaces.

The authors tackled the problem of estimating intrinsic dimension for datasets with discrete features, where existing methods designed for continuous spaces can cause errors. They introduced a new algorithm, demonstrated its accuracy on benchmarks, and applied it to a metagenomic dataset, finding an intrinsic dimension of order 2, suggesting evolutionary pressure acts on a low-dimensional manifold.

Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes