Unified token representations for sequential decision models
This work addresses scalability issues in offline RL for real-time or resource-constrained settings, offering an incremental improvement in efficiency.
The paper tackles the problem of redundant tokenization and high computational complexity in offline reinforcement learning transformers by proposing a Unified Token Representation (UTR) that merges return-to-go, state, and action into a single token, resulting in comparable or superior performance to state-of-the-art methods with markedly lower computation.
Transformers have demonstrated strong potential in offline reinforcement learning (RL) by modeling trajectories as sequences of return-to-go, states, and actions. However, existing approaches such as the Decision Transformer(DT) and its variants suffer from redundant tokenization and quadratic attention complexity, limiting their scalability in real-time or resource-constrained settings. To address this, we propose a Unified Token Representation (UTR) that merges return-to-go, state, and action into a single token, substantially reducing sequence length and model complexity. Theoretical analysis shows that UTR leads to a tighter Rademacher complexity bound, suggesting improved generalization. We further develop two variants: UDT and UDC, built upon transformer and gated CNN backbones, respectively. Both achieve comparable or superior performance to state-of-the-art methods with markedly lower computation. These findings demonstrate that UTR generalizes well across architectures and may provide an efficient foundation for scalable control in future large decision models.