LGAIROMar 3, 2025

Investigating Memory in RL with POPGym Arcade

arXiv:2503.01450v6h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of understanding and improving memory usage in RL for researchers, with incremental contributions through new analysis tools and benchmarks.

The authors tackled the problem of analyzing memory in deep reinforcement learning by introducing mathematical tools for fair policy evaluation under partial observability and a new benchmark suite, POPGym Arcade, to enable controlled studies. They identified a pathology where value functions incorrectly credit irrelevant history, showing how this can contaminate memory and affect tasks like sim-to-real transfer and offline RL.

How should we analyze memory in deep RL? We introduce mathematical tools for fairly analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated, pixel-based environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons, and identify a pathology where value functions smear credit over irrelevant history. With this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future, with implications for sim-to-real transfer and offline RL.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes