CLAINEMar 22, 2018

An Analysis of Neural Language Modeling at Multiple Scales

arXiv:1803.08240v1176 citations
Originality Synthesis-oriented
AI Analysis

This work provides efficient, high-performance language models for NLP researchers and practitioners, but it is incremental as it builds on established architectures.

The paper tackled language modeling by extending existing LSTM and QRNN models to larger vocabularies and character-level granularity, achieving state-of-the-art results on datasets like Penn Treebank and WikiText-103 with training times of 12 hours to 2 days on a single GPU.

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Code Implementations12 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes