LGDSMLNov 9, 2018

Density estimation for shift-invariant multidimensional distributions

arXiv:1811.03744v15 citations
Originality Highly original
AI Analysis

This addresses a fundamental problem in non-parametric statistics for researchers, offering efficient learning algorithms for a broader class of distributions with potential applications in machine learning and data analysis, though it is incremental in extending existing theory to shift-invariant cases.

The paper tackles density estimation for shift-invariant multidimensional distributions, which relax smoothness to allow discontinuities, and provides efficient algorithms with sample and time complexities, such as learning d-dimensional shift-invariant distributions with exponential tails using Õ_d(1/ε^{d+2}) samples and Õ_d(1/ε^{2d+2}) time, extending to noise-tolerant models and proving near-optimal lower bounds.

We study density estimation for classes of shift-invariant distributions over $\mathbb{R}^d$. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classes of distributions that we consider correspond to different rates of tail decay. For each such class we give an efficient algorithm that learns any distribution in the class from independent samples with respect to total variation distance. As a special case of our general result, we show that $d$-dimensional shift-invariant distributions which satisfy an exponential tail bound can be learned to total variation distance error $ε$ using $\tilde{O}_d(1/ ε^{d+2})$ examples and $\tilde{O}_d(1/ ε^{2d+2})$ time. This implies that, for constant $d$, multivariate log-concave distributions can be learned in $\tilde{O}_d(1/ε^{2d+2})$ time using $\tilde{O}_d(1/ε^{d+2})$ samples, answering a question of [Diakonikolas, Kane and Stewart, 2016] All of our results extend to a model of noise-tolerant density estimation using Huber's contamination model, in which the target distribution to be learned is a $(1-ε,ε)$ mixture of some unknown distribution in the class with some other arbitrary and unknown distribution, and the learning algorithm must output a hypothesis distribution with total variation distance error $O(ε)$ from the target distribution. We show that our general results are close to best possible by proving a simple $Ω\left(1/ε^d\right)$ information-theoretic lower bound on sample complexity even for learning bounded distributions that are shift-invariant.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes