Estimation of Entropy in Constant Space with Improved Sample Complexity
This incremental improvement addresses memory-efficient streaming algorithms for entropy estimation, relevant for data analysis and machine learning.
The paper tackles the problem of estimating entropy of a distribution with constant memory, reducing sample complexity from (k/ε^3)·polylog(1/ε) to (k/ε^2)·polylog(1/ε).
Recent work of Acharya et al. (NeurIPS 2019) showed how to estimate the entropy of a distribution $\mathcal D$ over an alphabet of size $k$ up to $\pmε$ additive error by streaming over $(k/ε^3) \cdot \text{polylog}(1/ε)$ i.i.d. samples and using only $O(1)$ words of memory. In this work, we give a new constant memory scheme that reduces the sample complexity to $(k/ε^2)\cdot \text{polylog}(1/ε)$. We conjecture that this is optimal up to $\text{polylog}(1/ε)$ factors.