Testing properties of distributions in the streaming model
This work addresses memory-efficient distribution testing for streaming data, which is incremental as it builds on existing models with new constraints.
The paper tackles the problem of testing distribution properties under memory constraints in streaming and conditional access models, achieving a trade-off between sample and space complexity for identity testing and an almost optimal memory-efficient algorithm for learning monotone distributions.
We study distribution testing in the standard access model and the conditional access model when the memory available to the testing algorithm is bounded. In both scenarios, the samples appear in an online fashion and the goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint on how many samples can be stored at a given time. First, we provide a trade-off between the sample complexity and the space complexity for testing identity when the samples are drawn according to the conditional access oracle. We then show that we can learn a succinct representation of a monotone distribution efficiently with a memory constraint on the number of samples that are stored that is almost optimal. We also show that the algorithm for monotone distributions can be extended to a larger class of decomposable distributions.