CLLGNEAug 7, 2017

Regularizing and Optimizing LSTM Language Models

arXiv:1708.02182v11156 citations
Originality Incremental advance
AI Analysis

This work addresses language modeling for NLP applications, offering incremental improvements through regularization and optimization techniques.

The paper tackled improving LSTM language models by proposing weight-dropped LSTM for regularization and NT-ASGD for optimization, achieving state-of-the-art perplexities of 57.3 on Penn Treebank and 65.8 on WikiText-2, with further improvements to 52.8 and 52.0 using a neural cache.

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

Code Implementations45 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes