ML LGMay 15, 2025

One-Stage Top-$k$ Learning-to-Defer: Score-Based Surrogates with Theoretical Guarantees

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

arXiv:2505.10160v220.08 citationsh-index: 12

Originality Highly original

AI Analysis

This work addresses the limitation of existing one-stage deferral methods that only defer to a single expert, offering a more flexible and efficient approach for machine learning systems that rely on external entities, though it is incremental in extending prior score-based methods.

The paper tackles the problem of learning to defer predictions to multiple experts or entities in a single stage, introducing a framework that jointly optimizes prediction and deferral across k entities with a cost-sensitive loss and convex surrogate. It shows that this method outperforms Top-1 deferral on datasets like CIFAR-10 and SVHN, with Top-k(x) achieving better accuracy-cost trade-offs by adapting to input complexity.

We introduce the first one-stage Top-$k$ Learning-to-Defer framework, which unifies prediction and deferral by learning a shared score-based model that selects the $k$ most cost-effective entities-labels or experts-per input. While existing one-stage L2D methods are limited to deferring to a single expert, our approach jointly optimizes prediction and deferral across multiple entities through a single end-to-end objective. We define a cost-sensitive loss and derive a novel convex surrogate that is independent of the cardinality parameter $k$, enabling generalization across Top-$k$ regimes without retraining. Our formulation recovers the Top-1 deferral policy of prior score-based methods as a special case, and we prove that our surrogate is both Bayes-consistent and $\mathcal{H}$-consistent under mild assumptions. We further introduce an adaptive variant, Top-$k(x)$, which dynamically selects the number of consulted entities per input to balance predictive accuracy and consultation cost. Experiments on CIFAR-10 and SVHN confirm that our one-stage Top-$k$ method strictly outperforms Top-1 deferral, while Top-$k(x)$ achieves superior accuracy-cost trade-offs by tailoring allocations to input complexity.

View on arXiv PDF

Similar