Layer Flexible Adaptive Computational Time
This work addresses the challenge of adaptive network structure for sequence modeling, which is incremental as it builds upon existing adaptive computation time methods.
The authors tackled the problem of determining the optimal number of layers in deep recurrent neural networks for sequence data by proposing a layer flexible model with adaptive computation time, achieving performance improvements of 7% to 12% on financial and Wikipedia language modeling datasets.
Deep recurrent neural networks perform well on sequence data and are the model of choice. However, it is a daunting task to decide the structure of the networks, i.e. the number of layers, especially considering different computational needs of a sequence. We propose a layer flexible recurrent neural network with adaptive computation time, and expand it to a sequence to sequence model. Different from the adaptive computation time model, our model has a dynamic number of transmission states which vary by step and sequence. We evaluate the model on a financial data set and Wikipedia language modeling. Experimental results show the performance improvement of 7\% to 12\% and indicate the model's ability to dynamically change the number of layers along with the computational steps.