Fast Training of Recurrent Neural Networks with Stationary State Feedbacks
This work addresses the training efficiency problem for RNN users, offering an incremental improvement over existing methods.
The paper tackles the computational bottleneck of training recurrent neural networks (RNNs) by proposing a novel method that replaces backpropagation through time with a fixed gradient feedback mechanism, achieving competitive perplexity scores on language modeling benchmarks while significantly reducing training costs.
Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers at comparable parameter budgets. However, the recursive gradient computation with the backpropagation through time (or BPTT) algorithm remains the major computational bottleneck. In this work, we propose a novel method that replaces BPTT with a fixed gradient feedback mechanism, yielding an efficient approximation of the exact gradient propagation based on the assumption of time stationarity. Our approach leverages state-space model (SSM) principles to define a structured feedback matrix that directly propagates gradients from future time steps. This formulation bypasses the need for recursive gradient backpropagation, significantly reducing training overhead while preserving the network's ability to capture long-term dependencies. The experiments on language modeling benchmarks exhibit competitive perplexity scores, while significantly reducing the training costs. These promising results suggest that designing a feedback method like an SSM can fully exploit the efficiency advantages of RNNs for many practical applications.