ML LGMar 15

Learning-to-Defer with Expert-Conditioned Advice

Yannis Montreuil, Leina Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

arXiv:2603.1432487.24 citationsh-index: 12

AI Analysis

This addresses a limitation in modern AI systems that dynamically provide information to experts, offering a more flexible deferral framework, though it is incremental by extending existing learning-to-defer methods.

The paper tackles the problem of learning-to-defer with expert-conditioned advice, where systems can choose additional information for experts after routing, and shows that natural separated surrogates are inconsistent, while an augmented surrogate recovers the Bayes-optimal policy. Experiments on tabular, LLMs, and multi-modal tasks demonstrate improvements over standard learning-to-defer with adaptive advice-acquisition.

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, are inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, LLMs, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime.

View on arXiv PDF

Similar