LGOct 24, 2025

Unified token representations for sequential decision models

arXiv:2510.21448v1
Originality Incremental advance
AI Analysis

This work addresses scalability issues in offline RL for real-time or resource-constrained settings, offering an incremental improvement in efficiency.

The paper tackles the problem of redundant tokenization and high computational complexity in offline reinforcement learning transformers by proposing a Unified Token Representation (UTR) that merges return-to-go, state, and action into a single token, resulting in comparable or superior performance to state-of-the-art methods with markedly lower computation.

Transformers have demonstrated strong potential in offline reinforcement learning (RL) by modeling trajectories as sequences of return-to-go, states, and actions. However, existing approaches such as the Decision Transformer(DT) and its variants suffer from redundant tokenization and quadratic attention complexity, limiting their scalability in real-time or resource-constrained settings. To address this, we propose a Unified Token Representation (UTR) that merges return-to-go, state, and action into a single token, substantially reducing sequence length and model complexity. Theoretical analysis shows that UTR leads to a tighter Rademacher complexity bound, suggesting improved generalization. We further develop two variants: UDT and UDC, built upon transformer and gated CNN backbones, respectively. Both achieve comparable or superior performance to state-of-the-art methods with markedly lower computation. These findings demonstrate that UTR generalizes well across architectures and may provide an efficient foundation for scalable control in future large decision models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes