ML LGOct 30, 2022

Learning to Defer to Multiple Experts: Consistent Surrogate Losses, Confidence Calibration, and Conformal Ensembles

Rajeev Verma, Daniel Barrejón, Eric Nalisnick

arXiv:2210.16955v227.367 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses the incremental challenge of improving decision-making systems that rely on multiple human or AI experts, with potential applications in domains like medical diagnosis and content moderation.

The paper tackled the problem of learning to defer to multiple experts by deriving consistent surrogate losses, analyzing confidence calibration, and proposing a conformal inference method for expert selection, achieving empirical validation on classification tasks such as galaxy, skin lesion, and hate speech.

We study the statistical properties of learning to defer (L2D) to multiple experts. In particular, we address the open problems of deriving a consistent surrogate loss, confidence calibration, and principled ensembling of experts. Firstly, we derive two consistent surrogates -- one based on a softmax parameterization, the other on a one-vs-all (OvA) parameterization -- that are analogous to the single expert losses proposed by Mozannar and Sontag (2020) and Verma and Nalisnick (2022), respectively. We then study the frameworks' ability to estimate P( m_j = y | x ), the probability that the jth expert will correctly predict the label for x. Theory shows the softmax-based loss causes mis-calibration to propagate between the estimates while the OvA-based loss does not (though in practice, we find there are trade offs). Lastly, we propose a conformal inference technique that chooses a subset of experts to query when the system defers. We perform empirical validation on tasks for galaxy, skin lesion, and hate speech classification.

View on arXiv PDF Code

Similar