Confidence Regularized Masked Language Modeling using Text Length
This addresses overconfidence in language models for NLP tasks, but it is incremental as it builds on existing masked language modeling methods.
The paper tackled the problem of masked language models ignoring plausible alternative predictions, especially for short texts, by proposing a confidence regularizer based on input length, which improved accuracy and expected calibration error on GLUE and SQuAD benchmarks.
Masked language modeling is a widely used method for learning language representations, where the model predicts a randomly masked word in each input. However, this approach typically considers only a single correct answer during training, ignoring the variety of plausible alternatives that humans might choose. This issue becomes more pronounced when the input text is short, as the possible word distribution tends to have higher entropy, potentially causing the model to become overconfident in its predictions. To mitigate this, we propose a novel confidence regularizer that adaptively adjusts the regularization strength based on the input length. Experiments on the GLUE and SQuAD benchmarks show that our method improves both accuracy and expected calibration error