LG AIJun 2, 2022

Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

Kevin Esslinger, Robert Platt, Christopher Amato

arXiv:2206.01078v221.654 citationsh-index: 33Has Code

Originality Highly original

AI Analysis

This addresses the challenge of training robust memory-based agents in partially observable environments, offering a more stable alternative to recurrent approaches.

The paper tackles the problem of partial observability in reinforcement learning by proposing Deep Transformer Q-Networks (DTQN), which uses transformers to encode agent history, resulting in faster and more stable performance compared to recurrent neural networks.

Real-world reinforcement learning tasks often involve some form of partial observability where the observations only give a partial or noisy view of the true state of the world. Such tasks typically require some form of memory, where the agent has access to multiple past observations, in order to perform well. One popular way to incorporate memory is by using a recurrent neural network to access the agent's history. However, recurrent neural networks in reinforcement learning are often fragile and difficult to train, susceptible to catastrophic forgetting and sometimes fail completely as a result. In this work, we propose Deep Transformer Q-Networks (DTQN), a novel architecture utilizing transformers and self-attention to encode an agent's history. DTQN is designed modularly, and we compare results against several modifications to our base model. Our experiments demonstrate the transformer can solve partially observable tasks faster and more stably than previous recurrent approaches.

View on arXiv PDF Code

Similar