CLMar 3, 2025

Your Model is Overconfident, and Other Lies We Tell Ourselves

Timothee Mickus, Aman Sinha, Raúl Vázquez

arXiv:2503.01235v14.91 citationsh-index: 8ACL

Originality Incremental advance

AI Analysis

This work addresses the problem of overconfidence and misaligned evaluation metrics in NLP models, which is incremental in refining data complexity understanding for researchers and practitioners.

The study investigated the relationship between different metrics for assessing intrinsic difficulty in neural NLP models, such as annotator dissensus and model confidence, revealing that their correlations are non-linear and non-monotonic across 29 models on three datasets.

The difficulty intrinsic to a given example, rooted in its inherent ambiguity, is a key yet often overlooked factor in evaluating neural NLP models. We investigate the interplay and divergence among various metrics for assessing intrinsic difficulty, including annotator dissensus, training dynamics, and model confidence. Through a comprehensive analysis using 29 models on three datasets, we reveal that while correlations exist among these metrics, their relationships are neither linear nor monotonic. By disentangling these dimensions of uncertainty, we aim to refine our understanding of data complexity and its implications for evaluating and improving NLP models.

View on arXiv PDF

Similar