Non-Asymptotic Performance Guarantees for Neural Estimation of $\mathsf{f}$-Divergences
This work addresses a theoretical gap for practitioners using neural divergence estimators, though it is incremental as it builds on existing variational methods.
The paper tackles the performance guarantees of neural network estimators for f-divergences, deriving non-asymptotic error bounds that quantify the tradeoff between approximation and estimation errors, with numerical validation provided.
Statistical distances (SDs), which quantify the dissimilarity between probability distributions, are central to machine learning and statistics. A modern method for estimating such distances from data relies on parametrizing a variational form by a neural network (NN) and optimizing it. These estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. In particular, there seems to be a fundamental tradeoff between the two sources of error involved: approximation and estimation. While the former needs the NN class to be rich and expressive, the latter relies on controlling complexity. This paper explores this tradeoff by means of non-asymptotic error bounds, focusing on three popular choices of SDs -- Kullback-Leibler divergence, chi-squared divergence, and squared Hellinger distance. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory. Numerical results validating the theory are also provided.