Short-Term Memory Optimization in Recurrent Neural Networks by Autoencoder-based Initialization
This addresses the vanishing gradient problem in RNNs for sequence learning tasks, though it appears incremental as it builds on existing autoencoder techniques.
The paper tackles the problem of training RNNs to learn long-term dependencies by proposing an autoencoder-based initialization method that maximizes short-term memory with a closed-form solution, achieving lower reconstruction error for long sequences and better gradient propagation during finetuning on sequential and permuted MNIST tasks.
Training RNNs to learn long-term dependencies is difficult due to vanishing gradients. We explore an alternative solution based on explicit memorization using linear autoencoders for sequences, which allows to maximize the short-term memory and that can be solved with a closed-form solution without backpropagation. We introduce an initialization schema that pretrains the weights of a recurrent neural network to approximate the linear autoencoder of the input sequences and we show how such pretraining can better support solving hard classification tasks with long sequences. We test our approach on sequential and permuted MNIST. We show that the proposed approach achieves a much lower reconstruction error for long sequences and a better gradient propagation during the finetuning phase.