CLOct 23, 2022

Lexical Generalization Improves with Larger Models and Longer Training

IBM
arXiv:2210.12673v2294 citationsh-index: 45
Originality Incremental advance
AI Analysis

This addresses the issue of superficial feature reliance in language models for researchers and practitioners, but it is incremental as it builds on existing knowledge about model behavior.

The study tackled the problem of fine-tuned language models relying on lexical overlap heuristics, which can cause failures on challenging inputs, and found that larger models and longer training reduce this reliance, with evidence linking the disparity to pre-trained models.

While fine-tuned language models perform well on many tasks, they were also shown to rely on superficial surface features such as lexical overlap. Excessive utilization of such heuristics can lead to failure on challenging inputs. We analyze the use of lexical overlap heuristics in natural language inference, paraphrase detection, and reading comprehension (using a novel contrastive dataset), and find that larger models are much less susceptible to adopting lexical overlap heuristics. We also find that longer training leads models to abandon lexical overlap heuristics. Finally, we provide evidence that the disparity between models size has its source in the pre-trained model

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes