NE LGApr 3, 2015

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

arXiv:1504.00941v246.1762 citations

Originality Incremental advance

AI Analysis

This addresses the issue of vanishing and exploding gradients for researchers and practitioners in machine learning, offering a simpler alternative to complex architectures like LSTM, though it is incremental as it builds on existing recurrent network methods.

The paper tackles the problem of learning long-term dependencies in recurrent networks by proposing a simple initialization method using identity or scaled identity matrices for recurrent weight matrices in rectified linear unit networks, achieving results comparable to LSTM on four benchmarks including toy problems, language modeling, and speech recognition.

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

View on arXiv PDF

Similar