Multivariate f-Divergence Estimation With Confidence
This work addresses the need for reliable divergence estimation with known convergence properties in machine learning and statistics, though it is incremental as it builds on an existing estimator.
The paper tackled the problem of estimating f-divergence between two distributions from finite samples by establishing the asymptotic normality of an ensemble estimator, enabling divergence-based inference tasks such as testing distribution equality and empirically bounding classification error.
The problem of f-divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples. This estimator has MSE convergence rate of O(1/T), is simple to implement, and performs well in high dimensions. This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples. We experimentally validate our theoretical results and, as an illustration, use them to empirically bound the best achievable classification error.