CLJul 18, 2017

On the State of the Art of Evaluation in Neural Language Models

arXiv:1707.05589v228.8556 citations

Originality Incremental advance

AI Analysis

This work addresses evaluation inconsistencies for researchers in natural language processing, providing more reliable benchmarks.

The paper tackled the problem of inconsistent evaluation in neural language models by conducting a large-scale hyperparameter tuning study, finding that properly regularized standard LSTM architectures outperform more recent models and establishing new state-of-the-art results on Penn Treebank and Wikitext-2 corpora.

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.

View on arXiv PDF

Similar