ITLGSTFeb 7, 2020

On the Estimation of Information Measures of Continuous Distributions

arXiv:2002.02851v314 citations
AI Analysis

This work addresses a fundamental statistical challenge for researchers and practitioners in machine learning, offering incremental theoretical insights into information measure estimation.

The paper tackles the problem of estimating differential entropy from finite samples for continuous distributions, showing that estimation is infeasible without additional assumptions and providing confidence bounds for histogram-based estimation under Lipschitz continuity and bounded support conditions.

The estimation of information measures of continuous distributions based on samples is a fundamental problem in statistics and machine learning. In this paper, we analyze estimates of differential entropy in $K$-dimensional Euclidean space, computed from a finite number of samples, when the probability density function belongs to a predetermined convex family $\mathcal{P}$. First, estimating differential entropy to any accuracy is shown to be infeasible if the differential entropy of densities in $\mathcal{P}$ is unbounded, clearly showing the necessity of additional assumptions. Subsequently, we investigate sufficient conditions that enable confidence bounds for the estimation of differential entropy. In particular, we provide confidence bounds for simple histogram based estimation of differential entropy from a fixed number of samples, assuming that the probability density function is Lipschitz continuous with known Lipschitz constant and known, bounded support. Our focus is on differential entropy, but we provide examples that show that similar results hold for mutual information and relative entropy as well.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes