AILGOct 1, 2023

Pre-training with Synthetic Data Helps Offline Reinforcement Learning

arXiv:2310.00771v411 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of data efficiency in offline reinforcement learning for researchers and practitioners, offering a simpler and more accessible pre-training method.

The paper tackles the problem of improving offline deep reinforcement learning by showing that pre-training with simple synthetic data, such as IID or Markov chain-generated data, can match or exceed the performance gains from language pre-training, achieving consistent improvements on D4RL Gym locomotion datasets.

Recently, it has been shown that for offline deep reinforcement learning (DRL), pre-training Decision Transformer with a large language corpus can improve downstream performance (Reid et al., 2022). A natural question to ask is whether this performance gain can only be achieved with language pre-training, or can be achieved with simpler pre-training schemes which do not involve language. In this paper, we first show that language is not essential for improved performance, and indeed pre-training with synthetic IID data for a small number of updates can match the performance gains from pre-training with a large language corpus; moreover, pre-training with data generated by a one-step Markov chain can further improve the performance. Inspired by these experimental results, we then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm, which is Q-learning-based and typically employs a Multi-Layer Perceptron (MLP) backbone. Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL, providing consistent performance improvement on D4RL Gym locomotion datasets. The results of this paper not only illustrate the importance of pre-training for offline DRL but also show that the pre-training data can be synthetic and generated with remarkably simple mechanisms.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes