NELGApr 3, 2015

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

arXiv:1504.00941v2760 citations
Originality Incremental advance
AI Analysis

This addresses the issue of vanishing and exploding gradients for researchers and practitioners in machine learning, offering a simpler alternative to complex architectures like LSTM, though it is incremental as it builds on existing recurrent network methods.

The paper tackles the problem of learning long-term dependencies in recurrent networks by proposing a simple initialization method using identity or scaled identity matrices for recurrent weight matrices in rectified linear unit networks, achieving results comparable to LSTM on four benchmarks including toy problems, language modeling, and speech recognition.

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

Code Implementations6 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes