LGCLMLJun 10, 2019

Improving Neural Language Modeling via Adversarial Training

arXiv:1906.03805v2126 citations
Originality Incremental advance
AI Analysis

This addresses the problem of overfitting in neural language models for researchers and practitioners, offering a simple and effective regularization method that is incremental in nature.

The paper tackles overfitting in large-scale neural language models by introducing an adversarial training mechanism that adds noise to the output embedding layer, achieving state-of-the-art test perplexity scores of 46.01 on Penn Treebank and 38.07 on Wikitext-2, and improving BLEU scores in machine translation tasks.

Recently, substantial progress has been made in language modeling by using deep neural networks. However, in practice, large scale neural language models have been shown to be prone to overfitting. In this paper, we present a simple yet highly effective adversarial training mechanism for regularizing neural language models. The idea is to introduce adversarial noise to the output embedding layer while training the models. We show that the optimal adversarial noise yields a simple closed-form solution, thus allowing us to develop a simple and time efficient algorithm. Theoretically, we show that our adversarial mechanism effectively encourages the diversity of the embedding vectors, helping to increase the robustness of models. Empirically, we show that our method improves on the single model state-of-the-art results for language modeling on Penn Treebank (PTB) and Wikitext-2, achieving test perplexity scores of 46.01 and 38.07, respectively. When applied to machine translation, our method improves over various transformer-based translation baselines in BLEU scores on the WMT14 English-German and IWSLT14 German-English tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes