$k$-Variance: A Clustered Notion of Variance
This work provides a novel statistical tool for researchers and practitioners interested in analyzing the local structure and shape of data distributions, offering an alternative to traditional variance.
This paper introduces $k$-variance, a new measure of distributional shape based on random bipartite matchings. It captures local information about a measure by calculating the expected cost of matching two sets of $k$ samples.
We introduce $k$-variance, a generalization of variance built on the machinery of random bipartite matchings. $K$-variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local rather than global information about a measure as $k$ increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining $k$-variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of $\mathbb R^n$. We conclude with experiments and open problems motivated by this new way to summarize distributional shape.