LGAICVOct 31, 2024

Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales

arXiv:2411.00132v23 citationsh-index: 5Has CodeNIPS
Originality Highly original
AI Analysis

This addresses the need for safe deployment of foundation models in high-stakes applications by improving both prediction and rationale correctness, representing a novel approach to evaluation and optimization.

The paper tackles the problem of ensuring that foundation models provide correct predictions backed by correct rationales, rather than just focusing on accuracy, and results in improvements of up to 10.1% in prediction accuracy and significant gains in rationale correctness, such as 36.5% in disentanglement.

Large pretrained foundation models demonstrate exceptional performance and, in some high-stakes applications, even surpass human experts. However, most of these models are currently evaluated primarily on prediction accuracy, overlooking the validity of the rationales behind their accurate predictions. For the safe deployment of foundation models, there is a pressing need to ensure double-correct predictions, i.e., correct prediction backed by correct rationales. To achieve this, we propose a two-phase scheme: First, we curate a new dataset that offers structured rationales for visual recognition tasks. Second, we propose a rationale-informed optimization method to guide the model in disentangling and localizing visual evidence for each rationale, without requiring manual annotations. Extensive experiments and ablation studies demonstrate that our model outperforms state-of-the-art models by up to 10.1% in prediction accuracy across a wide range of tasks. Furthermore, our method significantly improves the model's rationale correctness, improving localization by 7.5% and disentanglement by 36.5%. Our dataset, source code, and pretrained weights: https://github.com/deep-real/DCP

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes