RePreM: Representation Pre-training with Masked Model for Reinforcement Learning
This work addresses representation learning for RL practitioners, offering a simple and scalable method that improves sample efficiency and transfer capabilities, though it appears incremental by building on existing masked modeling techniques.
The paper tackles the problem of representation pre-training in reinforcement learning by proposing RePreM, a masked model that predicts masked states or actions in trajectories, which is shown to be effective in dynamic prediction, transfer learning, and sample-efficient RL, scaling well with dataset size and encoder scale.
Inspired by the recent success of sequence modeling in RL and the use of masked language model for pre-training, we propose a masked model for pre-training in RL, RePreM (Representation Pre-training with Masked Model), which trains the encoder combined with transformer blocks to predict the masked states or actions in a trajectory. RePreM is simple but effective compared to existing representation pre-training methods in RL. It avoids algorithmic sophistication (such as data augmentation or estimating multiple models) with sequence modeling and generates a representation that captures long-term dynamics well. Empirically, we demonstrate the effectiveness of RePreM in various tasks, including dynamic prediction, transfer learning, and sample-efficient RL with both value-based and actor-critic methods. Moreover, we show that RePreM scales well with dataset size, dataset quality, and the scale of the encoder, which indicates its potential towards big RL models.