ML LGOct 31, 2018

Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds

Kry Yik Chau Lui, Gavin Weiguang Ding, Ruitong Huang, Robert J. McCann

arXiv:1811.00115v11.91 citations

Originality Highly original

AI Analysis

This provides fundamental limitations for dimensionality reduction in information retrieval, showing inherent trade-offs rather than incremental improvements.

The paper proves that dimensionality reduction maps cannot achieve perfect precision and recall simultaneously, establishing an upper bound on precision for Lipschitz continuous maps and proposing a Wasserstein distance-based measure with similar theoretical guarantees.

In this paper, we investigate Dimensionality reduction (DR) maps in an information retrieval setting from a quantitative topology point of view. In particular, we show that no DR maps can achieve perfect precision and perfect recall simultaneously. Thus a continuous DR map must have imperfect precision. We further prove an upper bound on the precision of Lipschitz continuous DR maps. While precision is a natural measure in an information retrieval setting, it does not measure `how' wrong the retrieved data is. We therefore propose a new measure based on Wasserstein distance that comes with similar theoretical guarantee. A key technical step in our proofs is a particular optimization problem of the $L_2$-Wasserstein distance over a constrained set of distributions. We provide a complete solution to this optimization problem, which can be of independent interest on the technical side.

View on arXiv PDF

Similar