LG AIMar 25

No Single Metric Tells the Whole Story: A Multi-Dimensional Evaluation Framework for Uncertainty Attributions

Emily Schiller, Teodor Chiaburu, Marco Zullich, Luca Longo

arXiv:2603.2452428.6h-index: 4

Predicted impact top 75% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses the problem of comparability in uncertainty attribution evaluation for researchers and practitioners in explainable AI, though it is incremental as it builds on existing frameworks.

The authors tackled the inconsistent evaluation of uncertainty attribution methods in explainable AI by proposing a multi-dimensional framework based on the Co-12 properties, introducing a new property called conveyance, and testing it with eight metrics on tabular and image data, showing that gradient-based methods outperform perturbation-based ones in consistency and conveyance.

Research on explainable AI (XAI) has frequently focused on explaining model predictions. More recently, methods have been proposed to explain prediction uncertainty by attributing it to input features (uncertainty attributions). However, the evaluation of these methods remains inconsistent as studies rely on heterogeneous proxy tasks and metrics, hindering comparability. We address this by aligning uncertainty attributions with the well-established Co-12 framework for XAI evaluation. We propose concrete implementations for the correctness, consistency, continuity, and compactness properties. Additionally, we introduce conveyance, a property tailored to uncertainty attributions that evaluates whether controlled increases in epistemic uncertainty reliably propagate to feature-level attributions. We demonstrate our evaluation framework with eight metrics across combinations of uncertainty quantification and feature attribution methods on tabular and image data. Our experiments show that gradient-based methods consistently outperform perturbation-based approaches in consistency and conveyance, while Monte-Carlo dropconnect outperforms Monte-Carlo dropout in most metrics. Although most metrics rank the methods consistently across samples, inter-method agreement remains low. This suggests no single metric sufficiently evaluates uncertainty attribution quality. The proposed evaluation framework contributes to the body of knowledge by establishing a foundation for systematic comparison and development of uncertainty attribution methods.

View on arXiv PDF

Similar