CLLGDec 15, 2015

Strategies for Training Large Vocabulary Neural Language Models

arXiv:1512.04906v1143 citations
Originality Synthesis-oriented
AI Analysis

This work addresses scalability issues for applications like speech recognition and machine translation, but it is incremental as it focuses on comparing and extending existing methods.

The paper tackled the computational cost of training neural language models with large vocabularies by systematically comparing strategies like softmax variants and noise contrastive estimation, evaluating them on benchmarks for performance on rare words and speed/accuracy trade-offs.

Training neural network language models over large vocabularies is still computationally very costly compared to count-based models such as Kneser-Ney. At the same time, neural language models are gaining popularity for many applications such as speech recognition and machine translation whose success depends on scalability. We present a systematic comparison of strategies to represent and train large vocabularies, including softmax, hierarchical softmax, target sampling, noise contrastive estimation and self normalization. We further extend self normalization to be a proper estimator of likelihood and introduce an efficient variant of softmax. We evaluate each method on three popular benchmarks, examining performance on rare words, the speed/accuracy trade-off and complementarity to Kneser-Ney.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes