AIJan 22

Decoupling Return-to-Go for Efficient Decision Transformer

arXiv:2601.15953v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This addresses an efficiency and performance issue in offline reinforcement learning for researchers and practitioners, though it is incremental as it builds directly on the Decision Transformer framework.

The paper tackled the redundancy in Decision Transformer's use of Return-to-Go sequences, proposing Decoupled DT to process only the latest RTG, which improved performance and reduced computational cost, with experiments showing it significantly outperforms DT and is competitive against state-of-the-art variants.

The Decision Transformer (DT) has established a powerful sequence modeling approach to offline reinforcement learning. It conditions its action predictions on Return-to-Go (RTG), using it both to distinguish trajectory quality during training and to guide action generation at inference. In this work, we identify a critical redundancy in this design: feeding the entire sequence of RTGs into the Transformer is theoretically unnecessary, as only the most recent RTG affects action prediction. We show that this redundancy can impair DT's performance through experiments. To resolve this, we propose the Decoupled DT (DDT). DDT simplifies the architecture by processing only observation and action sequences through the Transformer, using the latest RTG to guide the action prediction. This streamlined approach not only improves performance but also reduces computational cost. Our experiments show that DDT significantly outperforms DT and establishes competitive performance against state-of-the-art DT variants across multiple offline RL tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes