LGSTMEMLMay 12, 2025

Generalization Bounds and Stopping Rules for Learning with Self-Selected Data

arXiv:2505.07367v15 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work provides theoretical guarantees for generalization in self-selected data learning methods, which is incremental as it builds on the unified framework of reciprocal learning.

The paper addresses generalization in learning paradigms that self-select training data, such as active learning and semi-supervised learning, by proving universal generalization bounds using covering numbers and Wasserstein ambiguity sets, with results applicable to both convergent and finite iteration solutions, including anytime valid stopping rules for practitioners.

Many learning paradigms self-select training data in light of previously learned parameters. Examples include active learning, semi-supervised learning, bandits, or boosting. Rodemann et al. (2024) unify them under the framework of "reciprocal learning". In this article, we address the question of how well these methods can generalize from their self-selected samples. In particular, we prove universal generalization bounds for reciprocal learning using covering numbers and Wasserstein ambiguity sets. Our results require no assumptions on the distribution of self-selected data, only verifiable conditions on the algorithms. We prove results for both convergent and finite iteration solutions. The latter are anytime valid, thereby giving rise to stopping rules for a practitioner seeking to guarantee the out-of-sample performance of their reciprocal learning algorithm. Finally, we illustrate our bounds and stopping rules for reciprocal learning's special case of semi-supervised learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes