People Mover's Distance: Class level geometry using fast pairwise data adaptive transportation costs
This work addresses similarity analysis for large-scale, high-dimensional survey data, such as comparing U.S. counties, but is incremental as it builds on existing earth mover's distance methods.
The paper tackles the problem of defining a network graph on large, non-i.i.d. class collections by developing an approximate earth mover's distance algorithm using data-adaptive transportation costs, applied to a U.S. survey to measure county similarities.
We address the problem of defining a network graph on a large collection of classes. Each class is comprised of a collection of data points, sampled in a non i.i.d. way, from some unknown underlying distribution. The application we consider in this paper is a large scale high dimensional survey of people living in the US, and the question of how similar or different are the various counties in which these people live. We use a co-clustering diffusion metric to learn the underlying distribution of people, and build an approximate earth mover's distance algorithm using this data adaptive transportation cost.