LGMGMar 4, 2024

Geometry and Stability of Supervised Learning Problems

arXiv:2403.01660v2h-index: 10
Originality Incremental advance
AI Analysis

This provides a theoretical framework for analyzing stability in supervised learning, which is incremental but useful for researchers studying robustness and generalization.

The authors introduced the Risk distance, a metric for comparing supervised learning problems based on optimal transport, which enables stability analysis to quantify how issues like sampling bias or limited data affect problems. They explored the geometry of this space, showing classification problems are dense and providing geodesics and variants of the distance.

We introduce a notion of distance between supervised learning problems, which we call the Risk distance. This distance, inspired by optimal transport, facilitates stability results; one can quantify how seriously issues like sampling bias, noise, limited data, and approximations might change a given problem by bounding how much these modifications can move the problem under the Risk distance. With the distance established, we explore the geometry of the resulting space of supervised learning problems, providing explicit geodesics and proving that the set of classification problems is dense in a larger class of problems. We also provide two variants of the Risk distance: one that incorporates specified weights on a problem's predictors, and one that is more sensitive to the contours of a problem's risk landscape.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes