LGAICVJun 6, 2024

ATraDiff: Accelerating Online Reinforcement Learning with Imaginary Trajectories

arXiv:2406.04323v1
Originality Incremental advance
AI Analysis

This work addresses data efficiency challenges for researchers and practitioners in reinforcement learning, offering a novel method that is incremental by building on prior offline-to-online knowledge extraction approaches.

The paper tackles the problem of low data efficiency in online reinforcement learning with sparse rewards by proposing ATraDiff, a generative diffusion model that creates synthetic trajectories from offline data to augment training. Empirical results show that ATraDiff achieves state-of-the-art performance across various environments, with significant improvements in complex settings.

Training autonomous agents with sparse rewards is a long-standing problem in online reinforcement learning (RL), due to low data efficiency. Prior work overcomes this challenge by extracting useful knowledge from offline data, often accomplished through the learning of action distribution from offline data and utilizing the learned distribution to facilitate online RL. However, since the offline data are given and fixed, the extracted knowledge is inherently limited, making it difficult to generalize to new tasks. We propose a novel approach that leverages offline data to learn a generative diffusion model, coined as Adaptive Trajectory Diffuser (ATraDiff). This model generates synthetic trajectories, serving as a form of data augmentation and consequently enhancing the performance of online RL methods. The key strength of our diffuser lies in its adaptability, allowing it to effectively handle varying trajectory lengths and mitigate distribution shifts between online and offline data. Because of its simplicity, ATraDiff seamlessly integrates with a wide spectrum of RL methods. Empirical evaluation shows that ATraDiff consistently achieves state-of-the-art performance across a variety of environments, with particularly pronounced improvements in complicated settings. Our code and demo video are available at https://atradiff.github.io .

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes