NEFeb 24, 2017

Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies

arXiv:1702.07805v426 citations
Originality Highly original
AI Analysis

This addresses a fundamental bottleneck in RNN training for researchers and practitioners working on sequence modeling tasks with long-term dependencies.

The paper tackles the problem of training recurrent neural networks to capture long-term dependencies by introducing MIST RNNs, a NARX architecture with direct connections from the distant past, which shows superior vanishing-gradient properties, greater efficiency than LSTM, and substantial performance improvements on tasks requiring very long-term dependencies.

Recurrent neural networks (RNNs) have achieved state-of-the-art performance on many diverse tasks, from machine translation to surgical activity recognition, yet training RNNs to capture long-term dependencies remains difficult. To date, the vast majority of successful RNN architectures alleviate this problem using nearly-additive connections between states, as introduced by long short-term memory (LSTM). We take an orthogonal approach and introduce MIST RNNs, a NARX RNN architecture that allows direct connections from the very distant past. We show that MIST RNNs 1) exhibit superior vanishing-gradient properties in comparison to LSTM and previously-proposed NARX RNNs; 2) are far more efficient than previously-proposed NARX RNN architectures, requiring even fewer computations than LSTM; and 3) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very long-term dependencies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes