LG DS STNov 1, 2014

Near-Optimal Density Estimation in Near-Linear Time Using Variable-Width Histograms

Siu-On Chan, Ilias Diakonikolas, Rocco A. Servedio, Xiaorui Sun

arXiv:1411.0169v160 citations

Originality Highly original

AI Analysis

This work provides a near-optimal and efficient solution for density estimation, a fundamental problem in statistics and machine learning, with implications for data analysis and learning theory.

The paper tackles the problem of density estimation for arbitrary probability distributions over [0,1) by developing an algorithm that uses variable-width histograms to output a hypothesis distribution close to the true distribution in total variation distance, with sample size and running time nearly linear in k/ε² and optimal up to logarithmic factors, achieving an approximation factor C proven to be at least 2.

Let $p$ be an unknown and arbitrary probability distribution over $[0,1)$. We consider the problem of {\em density estimation}, in which a learning algorithm is given i.i.d. draws from $p$ and must (with high probability) output a hypothesis distribution that is close to $p$. The main contribution of this paper is a highly efficient density estimation algorithm for learning using a variable-width histogram, i.e., a hypothesis distribution with a piecewise constant probability density function. In more detail, for any $k$ and $ε$, we give an algorithm that makes $\tilde{O}(k/ε^2)$ draws from $p$, runs in $\tilde{O}(k/ε^2)$ time, and outputs a hypothesis distribution $h$ that is piecewise constant with $O(k \log^2(1/ε))$ pieces. With high probability the hypothesis $h$ satisfies $d_{\mathrm{TV}}(p,h) \leq C \cdot \mathrm{opt}_k(p) + ε$, where $d_{\mathrm{TV}}$ denotes the total variation distance (statistical distance), $C$ is a universal constant, and $\mathrm{opt}_k(p)$ is the smallest total variation distance between $p$ and any $k$-piecewise constant distribution. The sample size and running time of our algorithm are optimal up to logarithmic factors. The "approximation factor" $C$ in our result is inherent in the problem, as we prove that no algorithm with sample size bounded in terms of $k$ and $ε$ can achieve $C<2$ regardless of what kind of hypothesis distribution it uses.

View on arXiv PDF

Similar