LGAIJan 29, 2024

Context-Former: Stitching via Latent Conditioned Sequence Modeling

arXiv:2401.16452v35 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses a specific limitation in offline RL for improving decision-making by enhancing sequence modeling methods, representing an incremental advancement.

The paper tackles the problem that Decision Transformer lacks stitching capacity in offline reinforcement learning, and introduces ContextFormer, which integrates contextual imitation learning and sequence modeling to stitch sub-optimal trajectories, achieving competitive performance on D4RL benchmarks and outperforming other DT variants.

Offline reinforcement learning (RL) algorithms can learn better decision-making compared to behavior policies by stitching the suboptimal trajectories to derive more optimal ones. Meanwhile, Decision Transformer (DT) abstracts the RL as sequence modeling, showcasing competitive performance on offline RL benchmarks. However, recent studies demonstrate that DT lacks of stitching capacity, thus exploiting stitching capability for DT is vital to further improve its performance. In order to endow stitching capability to DT, we abstract trajectory stitching as expert matching and introduce our approach, ContextFormer, which integrates contextual information-based imitation learning (IL) and sequence modeling to stitch sub-optimal trajectory fragments by emulating the representations of a limited number of expert trajectories. To validate our approach, we conduct experiments from two perspectives: 1) We conduct extensive experiments on D4RL benchmarks under the settings of IL, and experimental results demonstrate ContextFormer can achieve competitive performance in multiple IL settings. 2) More importantly, we conduct a comparison of ContextFormer with various competitive DT variants using identical training datasets. The experimental results unveiled ContextFormer's superiority, as it outperformed all other variants, showcasing its remarkable performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes