LGMay 23

Streaming Reinforcement Learning under Partial Observability with Real-Time Recurrent Learning

arXiv:2605.2470930.8
Predicted impact top 51% in LG · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in online RL, this work enables streaming deep RL under partial observability, a previously unsolved problem.

Streaming RL under partial observability was tackled using recurrent trace units (RTUs) that enable exact real-time recurrent learning with linear complexity. The method sustains performance on MemoryChain tasks up to length 128 and is competitive with batched PPO on POPGym and masked MuJoCo.

Streaming reinforcement learning has emerged as an online learning paradigm that conforms to the restrictions of natural learning agents that process data incrementally, i.e. with a batch size of 1 and no replay buffer. While streaming RL has recently been shown to scale with deep function approximation with full observability, partially observable settings have remained out of reach. Truncated backpropagation through time collapses to a one-step gradient horizon under the streaming setting, and exact real-time recurrent learning is prohibitively expensive. We close this gap using recurrent trace units, a diagonal recurrent architecture that enables exact RTRL with linear time and memory complexity in the parameter count, and show that they integrate cleanly into existing streaming algorithms across both discrete and continuous control. On a MemoryChain diagnostic with chain lengths from 2 to 128, our method sustains performance where streaming TBPTT(1) baselines using feedforward, GRU, and RTU networks collapse. On five POPGym tasks and on partially observable MuJoCo continuous control, the streaming approach is competitive with batched PPO on POPGym and recovers a substantial fraction of batched performance on masked MuJoCo, despite using no replay buffer or batched updates.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes