LGAIFeb 15, 2024

Recurrent Reinforcement Learning with Memoroids

arXiv:2402.09900v38 citationsh-index: 30NIPS
Originality Highly original
AI Analysis

This work addresses scalability issues in reinforcement learning for partially observable environments, offering a novel approach to improve training efficiency and performance.

The paper tackled the problem of scaling memory models in recurrent reinforcement learning for long sequences by introducing a monoid-based framework called memoroids, which led to a new batching method that improved sample efficiency and increased return, with concrete gains demonstrated empirically.

Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states. Neither model scales particularly well to long sequences, especially compared to an emerging class of memory models called Linear Recurrent Models. We discover that the recurrent update of these models resembles a monoid, leading us to reformulate existing models using a novel monoid-based framework that we call memoroids. We revisit the traditional approach to batching in recurrent reinforcement learning, highlighting theoretical and empirical deficiencies. We leverage memoroids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in reinforcement learning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes