CLJul 21, 2017

Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks

arXiv:1707.06799v2307 citations
Originality Synthesis-oriented
AI Analysis

This provides guidance for researchers and practitioners in NLP to improve performance in sequence labeling tasks, though it is incremental as it focuses on optimizing existing methods.

The paper tackled the problem of selecting optimal hyperparameters for deep LSTM networks in sequence labeling tasks, finding that certain parameters like pre-trained word embeddings significantly impact performance while others like LSTM layers are less important, based on evaluating over 50,000 setups across five linguistic tasks.

Selecting optimal parameters for a neural network architecture can often make the difference between mediocre and state-of-the-art performance. However, little is published which parameters and design choices should be evaluated or selected making the correct hyperparameter optimization often a "black art that requires expert experiences" (Snoek et al., 2012). In this paper, we evaluate the importance of different network design choices and hyperparameters for five common linguistic sequence tagging tasks (POS, Chunking, NER, Entity Recognition, and Event Detection). We evaluated over 50.000 different setups and found, that some parameters, like the pre-trained word embeddings or the last layer of the network, have a large impact on the performance, while other parameters, for example the number of LSTM layers or the number of recurrent units, are of minor importance. We give a recommendation on a configuration that performs well among different tasks.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes