AI CL CVJul 2, 2024

Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Khyathi Raghavi Chandu, Linjie Li, Anas Awadalla, Ximing Lu, Jae Sung Park, Jack Hessel, Lijuan Wang, Yejin Choi

AI2UW

arXiv:2407.01942v111.68 citationsh-index: 33

Originality Synthesis-oriented

AI Analysis

This work addresses the need for more reliable and truthful AI systems by providing a domain-specific benchmark and metric for multimodal uncertainty, though it is incremental as it builds on existing VQA frameworks.

The paper tackles the problem of uncertainty awareness in vision-language AI systems by introducing a taxonomy distinguishing epistemic and aleatoric uncertainty, and presents a benchmark dataset with 178K VQA samples and a new metric for evaluation.

The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics.

View on arXiv PDF

Similar