CLMar 25, 2024

The Role of $n$-gram Smoothing in the Age of Neural Networks

Luca Malagutti, Andrius Buinovskij, Anej Svete, Clara Meister, Afra Amini, Ryan Cotterell

AI2ETH Zurich

arXiv:2403.17240v24.88 citationsh-index: 25NAACL

Originality Incremental advance

AI Analysis

It addresses the problem of improving regularization for neural language models, offering incremental advancements by adapting old techniques to new methods.

This paper re-examines classical n-gram smoothing techniques in the context of neural language models, showing that they can be converted into regularizers that are comparable to or sometimes outperform label smoothing on tasks like language modeling and machine translation.

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an understatement to suggest that the line of inquiry into $n$-gram smoothing techniques became dormant. This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models. First, we draw a formal equivalence between label smoothing, a popular regularization technique for neural language models, and add-$λ$ smoothing. Second, we derive a generalized framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models. Our empirical results find that our novel regularizers are comparable to and, indeed, sometimes outperform label smoothing on language modeling and machine translation.

View on arXiv PDF

Similar