Partitioning the Sample Space for a More Precise Shannon Entropy Estimation
This addresses a critical issue for applications relying on entropy estimation from limited data, though it appears incremental compared to existing state-of-the-art approaches.
The paper tackles the problem of estimating Shannon entropy from small datasets where the number of examples may be smaller than possible outcomes, by introducing a discrete entropy estimator that uses decomposability with missing mass and unseen outcomes to reduce bias. Experimental results show it outperforms classical estimators in undersampled regimes and performs comparably with state-of-the-art methods.
Reliable data-driven estimation of Shannon entropy from small data sets, where the number of examples is potentially smaller than the number of possible outcomes, is a critical matter in several applications. In this paper, we introduce a discrete entropy estimator, where we use the decomposability property in combination with estimations of the missing mass and the number of unseen outcomes to compensate for the negative bias induced by them. Experimental results show that the proposed method outperforms some classical estimators in undersampled regimes, and performs comparably with some well-established state-of-the-art estimators.