MLITSTNov 3, 2016

Adaptive Geometric Multiscale Approximations for Intrinsically Low-dimensional Data

arXiv:1611.01179v220 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of encoding and approximating high-dimensional, low-dimensional data for applications in machine learning and data analysis, offering an incremental improvement over existing geometric methods.

The paper tackles the problem of efficiently approximating high-dimensional data that is nearly supported on low-dimensional sets, such as manifolds, by introducing adaptive geometric multiscale approximations. It shows that these data-driven approximations perform well with a threshold chosen as a universal function of sample size, yielding a fast dictionary learning algorithm with complexity C n log n and providing theoretical guarantees and experimental validation.

We consider the problem of efficiently approximating and encoding high-dimensional data sampled from a probability distribution $ρ$ in $\mathbb{R}^D$, that is nearly supported on a $d$-dimensional set $\mathcal{M}$ - for example supported on a $d$-dimensional Riemannian manifold. Geometric Multi-Resolution Analysis (GMRA) provides a robust and computationally efficient procedure to construct low-dimensional geometric approximations of $\mathcal{M}$ at varying resolutions. We introduce a thresholding algorithm on the geometric wavelet coefficients, leading to what we call adaptive GMRA approximations. We show that these data-driven, empirical approximations perform well, when the threshold is chosen as a suitable universal function of the number of samples $n$, on a wide variety of measures $ρ$, that are allowed to exhibit different regularity at different scales and locations, thereby efficiently encoding data from more complex measures than those supported on manifolds. These approximations yield a data-driven dictionary, together with a fast transform mapping data to coefficients, and an inverse of such a map. The algorithms for both the dictionary construction and the transforms have complexity $C n \log n$ with the constant linear in $D$ and exponential in $d$. Our work therefore establishes adaptive GMRA as a fast dictionary learning algorithm with approximation guarantees. We include several numerical experiments on both synthetic and real data, confirming our theoretical results and demonstrating the effectiveness of adaptive GMRA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes