LG AI ROMar 3, 2025

Investigating Memory in RL with POPGym Arcade

Zekang Wang, Zhe He, Borong Zhang, Edan Toledo, Steven Morad

arXiv:2503.01450v6h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the challenge of understanding and improving memory usage in RL for researchers, with incremental contributions through new analysis tools and benchmarks.

The authors tackled the problem of analyzing memory in deep reinforcement learning by introducing mathematical tools for fair policy evaluation under partial observability and a new benchmark suite, POPGym Arcade, to enable controlled studies. They identified a pathology where value functions incorrectly credit irrelevant history, showing how this can contaminate memory and affect tasks like sim-to-real transfer and offline RL.

How should we analyze memory in deep RL? We introduce mathematical tools for fairly analyzing policies under partial observability and revealing how agents use memory to make decisions. To utilize these tools, we present POPGym Arcade, a collection of Atari-inspired, hardware-accelerated, pixel-based environments sharing a single observation and action space. Each environment provides fully and partially observable variants, enabling counterfactual studies on observability. We find that controlled studies are necessary for fair comparisons, and identify a pathology where value functions smear credit over irrelevant history. With this pathology, we demonstrate how out-of-distribution scenarios can contaminate memory, perturbing the policy far into the future, with implications for sim-to-real transfer and offline RL.

View on arXiv PDF

Similar