DSITLGSTMar 6, 2017

Near-Optimal Closeness Testing of Discrete Histogram Distributions

arXiv:1703.01913v131 citations
Originality Highly original
AI Analysis

This addresses a fundamental problem in statistical testing for computer science and statistics, offering near-optimal sample efficiency for histogram closeness testing.

The paper tackles the problem of testing equivalence between two discrete histogram distributions, presenting a new algorithm with sample complexity that nearly matches an information-theoretic lower bound, improving on previous work by polynomial factors.

We investigate the problem of testing the equivalence between two discrete histograms. A {\em $k$-histogram} over $[n]$ is a probability distribution that is piecewise constant over some set of $k$ intervals over $[n]$. Histograms have been extensively studied in computer science and statistics. Given a set of samples from two $k$-histogram distributions $p, q$ over $[n]$, we want to distinguish (with high probability) between the cases that $p = q$ and $\|p-q\|_1 \geq ε$. The main contribution of this paper is a new algorithm for this testing problem and a nearly matching information-theoretic lower bound. Specifically, the sample complexity of our algorithm matches our lower bound up to a logarithmic factor, improving on previous work by polynomial factors in the relevant parameters. Our algorithmic approach applies in a more general setting and yields improved sample upper bounds for testing closeness of other structured distributions as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes