CLSep 22, 2017

Improving Language Modelling with Noise-contrastive estimation

arXiv:1709.07758v1
Originality Incremental advance
AI Analysis

This addresses the computational bottleneck in language modeling for researchers and practitioners, though it is incremental as it focuses on hyperparameter optimization rather than a new method.

The paper tackled the problem of scaling neural language models to large vocabularies by showing that noise-contrastive estimation (NCE) can be successful with proper hyperparameter tuning, outperforming state-of-the-art single-model methods on a popular benchmark.

Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, it was considered to be an unsuccessful approach for language modelling. A sufficient investigation of the hyperparameters in the NCE-based neural language models was also missing. In this paper, we showed that NCE can be a successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the 'search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. We showed that appropriate tuning of NCE-based neural language models outperforms the state-of-the-art single-model methods on a popular benchmark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes