CLLGMLOct 10, 2022

Uncertainty Quantification with Pre-trained Language Models: A Large-Scale Empirical Analysis

arXiv:2210.04714v2346 citationsh-index: 119
Originality Incremental advance
AI Analysis

This work addresses the need for reliable uncertainty quantification in PLMs for safety-critical NLP applications, providing practical recommendations based on extensive empirical evidence, though it is incremental as it builds on prior limited studies.

The paper tackled the problem of minimizing calibration error in pre-trained language model (PLM) prediction pipelines for safety-critical NLP applications, conducting a large-scale empirical analysis across three classification tasks and domain shift settings to recommend optimal choices for PLM, uncertainty quantifier, and fine-tuning loss, such as using ELECTRA, larger models, Temp Scaling, and Focal Loss.

Pre-trained language models (PLMs) have gained increasing popularity due to their compelling prediction performance in diverse natural language processing (NLP) tasks. When formulating a PLM-based prediction pipeline for NLP tasks, it is also crucial for the pipeline to minimize the calibration error, especially in safety-critical applications. That is, the pipeline should reliably indicate when we can trust its predictions. In particular, there are various considerations behind the pipeline: (1) the choice and (2) the size of PLM, (3) the choice of uncertainty quantifier, (4) the choice of fine-tuning loss, and many more. Although prior work has looked into some of these considerations, they usually draw conclusions based on a limited scope of empirical studies. There still lacks a holistic analysis on how to compose a well-calibrated PLM-based prediction pipeline. To fill this void, we compare a wide range of popular options for each consideration based on three prevalent NLP classification tasks and the setting of domain shift. In response, we recommend the following: (1) use ELECTRA for PLM encoding, (2) use larger PLMs if possible, (3) use Temp Scaling as the uncertainty quantifier, and (4) use Focal Loss for fine-tuning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes