CLLGNov 10, 2017

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

arXiv:1711.03953v4420 citations
Originality Highly original
AI Analysis

This addresses a core modeling limitation for natural language processing applications.

The paper identifies the Softmax bottleneck as a fundamental limitation in neural language models and proposes a method that achieves state-of-the-art perplexities of 47.69 on Penn Treebank and 40.68 on WikiText-2, with a 5.6-point improvement on the 1B Word dataset.

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Code Implementations9 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes