LG AIJan 20, 2023

Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks

Khurram Javed, Haseeb Shah, Rich Sutton, Martha White

arXiv:2302.05326v313.714 citationsh-index: 74Has Code

Originality Incremental advance

AI Analysis

This addresses the problem of inefficient online updates in reinforcement learning agents for researchers and practitioners, offering a scalable alternative to existing methods, though it is incremental as it builds on RTRL with trade-offs in functional capacity.

The paper tackles the scalability issue of real-time recurrent learning (RTRL) for recurrent neural networks by proposing two constraints—decomposing into independent modules or learning in stages—that enable linear scaling with parameters without adding noise or bias to gradients. It demonstrates effectiveness on a prediction benchmark and Atari 2600 policy evaluation, showing improved performance over Truncated-BPTT.

Constructing states from sequences of observations is an important component of reinforcement learning agents. One solution for state construction is to use recurrent neural networks. Back-propagation through time (BPTT), and real-time recurrent learning (RTRL) are two popular gradient-based methods for recurrent learning. BPTT requires complete trajectories of observations before it can compute the gradients and is unsuitable for online updates. RTRL can do online updates but scales poorly to large networks. In this paper, we propose two constraints that make RTRL scalable. We show that by either decomposing the network into independent modules or learning the network in stages, we can make RTRL scale linearly with the number of parameters. Unlike prior scalable gradient estimation algorithms, such as UORO and Truncated-BPTT, our algorithms do not add noise or bias to the gradient estimate. Instead, they trade off the functional capacity of the network for computationally efficient learning. We demonstrate the effectiveness of our approach over Truncated-BPTT on a prediction benchmark inspired by animal learning and by doing policy evaluation of pre-trained policies for Atari 2600 games.

View on arXiv PDF Code

Similar