CL AI NEMar 22, 2018

An Analysis of Neural Language Modeling at Multiple Scales

Stephen Merity, Nitish Shirish Keskar, Richard Socher

arXiv:1803.08240v112.9176 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work provides efficient, high-performance language models for NLP researchers and practitioners, but it is incremental as it builds on established architectures.

The paper tackled language modeling by extending existing LSTM and QRNN models to larger vocabularies and character-level granularity, achieving state-of-the-art results on datasets like Penn Treebank and WikiText-103 with training times of 12 hours to 2 days on a single GPU.

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

View on arXiv PDF Code

Similar