Faster Algorithms for Testing under Conditional Sampling
This provides faster algorithms for statisticians and data scientists dealing with large domains, though it is incremental as it builds on prior work in sublinear algorithms.
The paper tackles the problem of distribution testing under conditional sampling, reducing the time and sample complexity for identity testing from Ω(ε^{-4}) to Ω(ε^{-2}), matching the information-theoretic lower bound, and for closeness testing from Ω(ε^{-4} log^5 k) to Ω(ε^{-5} log log k).
There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size $k$. We study two of the most important tests under the conditional-sampling model where each query specifies a subset $S$ of the domain, and the response is a sample drawn from $S$ according to the underlying distribution. For identity testing, which asks whether the underlying distribution equals a specific given distribution or $ε$-differs from it, we reduce the known time and sample complexities from $\tilde{\mathcal{O}}(ε^{-4})$ to $\tilde{\mathcal{O}}(ε^{-2})$, thereby matching the information theoretic lower bound. For closeness testing, which asks whether two distributions underlying observed data sets are equal or different, we reduce existing complexity from $\tilde{\mathcal{O}}(ε^{-4} \log^5 k)$ to an even sub-logarithmic $\tilde{\mathcal{O}}(ε^{-5} \log \log k)$ thus providing a better bound to an open problem in Bertinoro Workshop on Sublinear Algorithms [Fisher, 2004].