LG AIJun 15, 2023

Recurrent Action Transformer with Memory

Egor Cherepanov, Alexey Staroverov, Alexey K. Kovalev, Aleksandr I. Panov

arXiv:2306.09459v614.915 citationsh-index: 16Has Code

Originality Incremental advance

AI Analysis

This addresses a bottleneck in offline RL for memory-intensive applications, offering a unified architecture for better decision-making over extended horizons, though it is incremental as it builds on existing transformer and memory mechanisms.

The paper tackled the problem of transformers struggling with long-term memory in partially observable environments for offline reinforcement learning by proposing the Recurrent Action Transformer with Memory (RATE), which significantly improved performance in memory-dependent tasks while remaining competitive on standard benchmarks.

Transformers have become increasingly popular in offline reinforcement learning (RL) due to their ability to treat agent trajectories as sequences, reframing policy learning as a sequence modeling task. However, in partially observable environments (POMDPs), effective decision-making depends on retaining information about past events -- something that standard transformers struggle with due to the quadratic complexity of self-attention, which limits their context length. One solution to this problem is to extend transformers with memory mechanisms. We propose the Recurrent Action Transformer with Memory (RATE), a novel transformer-based architecture for offline RL that incorporates a recurrent memory mechanism designed to regulate information retention. We evaluate RATE across a diverse set of environments: memory-intensive tasks (ViZDoom-Two-Colors, T-Maze, Memory Maze, Minigrid-Memory, and POPGym), as well as standard Atari and MuJoCo benchmarks. Our comprehensive experiments demonstrate that RATE significantly improves performance in memory-dependent settings while remaining competitive on standard tasks across a broad range of baselines. These findings underscore the pivotal role of integrated memory mechanisms in offline RL and establish RATE as a unified, high-capacity architecture for effective decision-making over extended horizons. Code: https://sites.google.com/view/rate-model/.

View on arXiv PDF Code

Similar