A Framework for Measuring Appropriate Reliance on Set-Valued AI Advice
For researchers studying human-AI collaboration, this framework addresses the gap in evaluating reliance on set-valued advice, which is increasingly used to communicate uncertainty.
This paper develops the first formal framework for measuring appropriate reliance on set-valued AI advice (e.g., discrete sets or continuous intervals) in human-AI collaboration, spanning both classification and regression tasks. The framework introduces novel metrics—correct reliance rate on AI and self for classification, and quantity and quality of AI reliance for regression—that capture nuances overlooked by existing measures.
Appropriate reliance on AI advice has become a central research theme in human-AI collaboration. Existing frameworks have focused exclusively on point predictions as AI advice. However, set-valued AI advice (e.g., discrete sets or continuous intervals) is increasingly being used to communicate uncertainty and improve human decision making. In this paper, we develop the first formal framework for measuring appropriate reliance on set-valued AI advice within the sequential judge-advisor paradigm, spanning both classification and regression tasks. For classification, we first introduce the dimensions that are necessary for evaluating set-valued AI advice. We then define two metrics: correct reliance rate on AI and correct reliance rate on self, which jointly characterize appropriate reliance in this setting. For regression, we introduce quantity of AI reliance and quality of AI reliance, which respectively measure whether a decision maker utilized the AI advice and whether their reliance helped them get closer to the ground truth relative to their initial estimate. Through the application of our framework, we demonstrate how these metrics capture important nuances in human-AI collaboration that existing measures overlook.