Testing Properties of Multiple Distributions with Few Samples
This addresses a challenge in statistical hypothesis testing for scenarios with limited data per source, such as in distributed or high-dimensional settings.
The paper tackles the problem of testing properties like uniformity, identity, and closeness for multiple distributions with few samples per distribution, achieving sample optimal testers under an additional condition.
We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from $s$ distributions, $p_1, p_2, \ldots, p_s$, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the $p_i$'s are uniform or $ε$-far from being uniform in $\ell_1$-distance (2) Identity Testing: Testing whether all the $p_i$'s are equal to an explicitly given distribution $q$ or $ε$-far from $q$ in $\ell_1$-distance, and (3) Closeness Testing: Testing whether all the $p_i$'s are equal to a distribution $q$ which we have sample access to, or $ε$-far from $q$ in $\ell_1$-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.