Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
This addresses the challenge of scalable offline reinforcement learning for robotics, enabling better policy training from mixed human and autonomous data, though it is incremental as it builds on existing Q-learning and Transformer techniques.
The paper tackles the problem of training multi-task robotic manipulation policies from large offline datasets by introducing Q-Transformer, a method that uses a Transformer to represent Q-functions via discretized actions, and it outperforms prior offline RL and imitation learning techniques on a diverse real-world task suite.
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://qtransformer.github.io