Magnitude Distance: A Geometric Measure of Dataset Similarity
This work addresses a fundamental problem in machine learning for researchers and practitioners needing robust dataset comparison, though it appears incremental as it builds on existing geometric concepts.
The authors tackled the problem of quantifying dataset similarity by proposing magnitude distance, a novel metric based on the magnitude of a metric space with a tunable scaling parameter, and demonstrated its discriminative power in high-dimensional settings and utility as a training objective for generative models, showing comparable performance to established methods.
Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$, that controls the sensitivity to global structure (small $t$) and finer details (large $t$). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.