CO CG LGApr 15, 2025

Efficient and Stable Multi-Dimensional Kolmogorov-Smirnov Distance

Peter Matthew Jacobs, Foad Namjoo, Jeff M. Phillips

arXiv:2504.11299v11.2h-index: 2

Originality Incremental advance

AI Analysis

This provides a stable and efficient multidimensional statistical test for researchers in statistics and machine learning, though it builds incrementally on existing Kolmogorov-Smirnov concepts.

The authors tackled the problem of extending the Kolmogorov-Smirnov distance to multiple dimensions by proposing a formulation that maximizes differences over orthogonal dominating rectangular ranges, proving it converges to 0 with sample size growth and bounding this rate, and showing it can be computed efficiently in up to 4 dimensions with near-linear runtime for a given error, enabling a delta-precision two-sample hypothesis test.

We revisit extending the Kolmogorov-Smirnov distance between probability distributions to the multidimensional setting and make new arguments about the proper way to approach this generalization. Our proposed formulation maximizes the difference over orthogonal dominating rectangular ranges (d-sided rectangles in R^d), and is an integral probability metric. We also prove that the distance between a distribution and a sample from the distribution converges to 0 as the sample size grows, and bound this rate. Moreover, we show that one can, up to this same approximation error, compute the distance efficiently in 4 or fewer dimensions; specifically the runtime is near-linear in the size of the sample needed for that error. With this, we derive a delta-precision two-sample hypothesis test using this distance. Finally, we show these metric and approximation properties do not hold for other popular variants.

View on arXiv PDF

Similar