LGAIMar 11, 2021

Generalizable Episodic Memory for Deep Reinforcement Learning

arXiv:2103.06469v344 citations
AI Analysis

This addresses a bottleneck in reinforcement learning for continuous control and discrete action spaces, offering incremental improvements in sample efficiency.

The paper tackles the problem of episodic memory methods failing in continuous domains where states are never revisited, by proposing Generalizable Episodic Memory (GEM) to organize state-action values for implicit planning and reduce overestimation bias. The result shows significant performance improvements over baseline algorithms on MuJoCo continuous control tasks and Atari games.

Episodic memory-based methods can rapidly latch onto past successful strategies by a non-parametric memory and improve sample efficiency of traditional reinforcement learning. However, little effort is put into the continuous domain, where a state is never visited twice, and previous episodic methods fail to efficiently aggregate experience across trajectories. To address this problem, we propose Generalizable Episodic Memory (GEM), which effectively organizes the state-action values of episodic memory in a generalizable manner and supports implicit planning on memorized trajectories. GEM utilizes a double estimator to reduce the overestimation bias induced by value propagation in the planning process. Empirical evaluation shows that our method significantly outperforms existing trajectory-based methods on various MuJoCo continuous control tasks. To further show the general applicability, we evaluate our method on Atari games with discrete action space, which also shows a significant improvement over baseline algorithms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes