CLAILGOct 20, 2022

Exploring Predictive Uncertainty and Calibration in NLP: A Study on the Impact of Method & Data Scarcity

arXiv:2210.15452v1303 citationsh-index: 23Has Code
Originality Incremental advance
AI Analysis

This work addresses uncertainty calibration for NLP in low-resource settings, but it is incremental as it builds on existing methods without introducing new paradigms.

The study tackled the problem of predictive uncertainty estimation in neural classifiers for low-resource languages, finding that while pre-trained models and ensembles performed best, uncertainty quality could degrade with more data, and data uncertainty dominated total uncertainty.

We investigate the problem of determining the predictive confidence (or, conversely, uncertainty) of a neural classifier through the lens of low-resource languages. By training models on sub-sampled datasets in three different languages, we assess the quality of estimates from a wide array of approaches and their dependence on the amount of available data. We find that while approaches based on pre-trained models and ensembles achieve the best results overall, the quality of uncertainty estimates can surprisingly suffer with more data. We also perform a qualitative analysis of uncertainties on sequences, discovering that a model's total uncertainty seems to be influenced to a large degree by its data uncertainty, not model uncertainty. All model implementations are open-sourced in a software package.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes