CL AISep 26, 2016

Pointer Sentinel Mixture Models

Stephen Merity, Caiming Xiong, James Bradbury, Richard Socher

arXiv:1609.07843v156.44345 citations

Originality Highly original

AI Analysis

This addresses the issue of poor prediction for rare words in language modeling for NLP applications, representing a novel method rather than an incremental improvement.

The paper tackles the problem of neural sequence models struggling with rare or unseen words in language modeling by introducing a pointer sentinel mixture architecture that combines context-based word reproduction with standard softmax classification, achieving state-of-the-art performance with 70.9 perplexity on the Penn Treebank while using fewer parameters.

Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.

View on arXiv PDF

Similar