LGITMLMay 21, 2019

Geometric Estimation of Multivariate Dependency

arXiv:1905.08594v111 citations
Originality Incremental advance
AI Analysis

This provides a scalable method for dependency estimation in multivariate data, addressing a bottleneck in large-scale applications, though it is incremental as it builds on existing geometric and divergence concepts.

The paper tackles the problem of estimating dependency between multivariate samples without density estimation by proposing a geometric estimator based on a minimal spanning tree, which converges to geometric mutual information and is scalable to large datasets. They demonstrate advantages in experiments, with established asymptotic convergence and rates for smooth densities.

This paper proposes a geometric estimator of dependency between a pair of multivariate samples. The proposed estimator of dependency is based on a randomly permuted geometric graph (the minimal spanning tree) over the two multivariate samples. This estimator converges to a quantity that we call the geometric mutual information (GMI), which is equivalent to the Henze-Penrose divergence [1] between the joint distribution of the multivariate samples and the product of the marginals. The GMI has many of the same properties as standard MI but can be estimated from empirical data without density estimation; making it scalable to large datasets. The proposed empirical estimator of GMI is simple to implement, involving the construction of an MST spanning over both the original data and a randomly permuted version of this data. We establish asymptotic convergence of the estimator and convergence rates of the bias and variance for smooth multivariate density functions belonging to a Hölder class. We demonstrate the advantages of our proposed geometric dependency estimator in a series of experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes