IT AI MLFeb 17, 2017

Direct Estimation of Information Divergence Using Nearest Neighbor Ratios

Morteza Noshad, Kevin R. Moon, Salimeh Yasaei Sekeh, Alfred O. Hero

arXiv:1702.05222v248 citations

Originality Incremental advance

AI Analysis

This provides a computationally efficient estimator for information divergences, which are fundamental in statistics and machine learning, though it appears incremental relative to existing nearest-neighbor-based methods.

The authors tackled the problem of estimating information divergences (Rényi and f-divergence) between two probability distributions from finite samples, proposing a method based on k-nearest neighbor graphs that achieves an MSE rate of O(N^{-2γ/(γ+d)}) for smooth functions and a parametric rate of O(1/N) with an ensemble technique.

We propose a direct estimation method for Rényi and f-divergence measures based on a new graph theoretical interpretation. Suppose that we are given two sample sets $X$ and $Y$, respectively with $N$ and $M$ samples, where $η:=M/N$ is a constant value. Considering the $k$-nearest neighbor ($k$-NN) graph of $Y$ in the joint data set $(X,Y)$, we show that the average powered ratio of the number of $X$ points to the number of $Y$ points among all $k$-NN points is proportional to Rényi divergence of $X$ and $Y$ densities. A similar method can also be used to estimate f-divergence measures. We derive bias and variance rates, and show that for the class of $γ$-Hölder smooth functions, the estimator achieves the MSE rate of $O(N^{-2γ/(γ+d)})$. Furthermore, by using a weighted ensemble estimation technique, for density functions with continuous and bounded derivatives of up to the order $d$, and some extra conditions at the support set boundary, we derive an ensemble estimator that achieves the parametric MSE rate of $O(1/N)$. Our estimators are more computationally tractable than other competing estimators, which makes them appealing in many practical applications.

View on arXiv PDF

Similar