Estimating Entropy of Distributions in Constant Space
This addresses the challenge of entropy estimation in memory-constrained streaming settings, which is incremental as it builds on prior work but introduces a constant-space method.
The paper tackles the problem of estimating the entropy of k-ary distributions from samples in the streaming model with limited space, achieving an algorithm that uses constant space and outputs a ±ε estimate with O(k log(1/ε)²/ε³) samples.
We consider the task of estimating the entropy of $k$-ary distributions from samples in the streaming model, where space is limited. Our main contribution is an algorithm that requires $O\left(\frac{k \log (1/\varepsilon)^2}{\varepsilon^3}\right)$ samples and a constant $O(1)$ memory words of space and outputs a $\pm\varepsilon$ estimate of $H(p)$. Without space limitations, the sample complexity has been established as $S(k,\varepsilon)=Θ\left(\frac k{\varepsilon\log k}+\frac{\log^2 k}{\varepsilon^2}\right)$, which is sub-linear in the domain size $k$, and the current algorithms that achieve optimal sample complexity also require nearly-linear space in $k$. Our algorithm partitions $[0,1]$ into intervals and estimates the entropy contribution of probability values in each interval. The intervals are designed to trade off the bias and variance of these estimates.