ST LG NA MLDec 13, 2020

$k$-Variance: A Clustered Notion of Variance

Justin Solomon, Kristjan Greenewald, Haikady N. Nagaraja

arXiv:2012.06958v15.113 citationsh-index: 64

Originality Incremental advance

AI Analysis

This work provides a novel statistical tool for researchers and practitioners interested in analyzing the local structure and shape of data distributions, offering an alternative to traditional variance.

This paper introduces $k$-variance, a new measure of distributional shape based on random bipartite matchings. It captures local information about a measure by calculating the expected cost of matching two sets of $k$ samples.

We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local rather than global information about a measure as $k$ increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining $k$-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of $\mathbb R^n$. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.

View on arXiv PDF

Similar