LGNov 18, 2024

Enhancing Decision Transformer with Diffusion-Based Trajectory Branch Generation

arXiv:2411.11327v13 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in offline RL for researchers and practitioners, though it is an incremental improvement.

The paper tackles the problem of Decision Transformer converging to sub-optimal trajectories in offline reinforcement learning by generating trajectory branches with a diffusion model to expand the dataset, resulting in improved performance over state-of-the-art methods on the D4RL benchmark.

Decision Transformer (DT) can learn effective policy from offline datasets by converting the offline reinforcement learning (RL) into a supervised sequence modeling task, where the trajectory elements are generated auto-regressively conditioned on the return-to-go (RTG).However, the sequence modeling learning approach tends to learn policies that converge on the sub-optimal trajectories within the dataset, for lack of bridging data to move to better trajectories, even if the condition is set to the highest RTG.To address this issue, we introduce Diffusion-Based Trajectory Branch Generation (BG), which expands the trajectories of the dataset with branches generated by a diffusion model.The trajectory branch is generated based on the segment of the trajectory within the dataset, and leads to trajectories with higher returns.We concatenate the generated branch with the trajectory segment as an expansion of the trajectory.After expanding, DT has more opportunities to learn policies to move to better trajectories, preventing it from converging to the sub-optimal trajectories.Empirically, after processing with BG, DT outperforms state-of-the-art sequence modeling methods on D4RL benchmark, demonstrating the effectiveness of adding branches to the dataset without further modifications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes