LG AINov 12, 2025

Data Fusion-Enhanced Decision Transformer for Stable Cross-Domain Generalization

Guojian Wang, Quinson Hon, Xuyang Chen, Lin Zhao

arXiv:2511.09173v17.11 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses cross-domain generalization for offline reinforcement learning, offering incremental improvements in policy adaptation.

The paper tackled the challenge of cross-domain shifts in decision transformer policies by proposing Data Fusion-Enhanced Decision Transformer (DFDT), which improved return and stability over baselines on D4RL-style control tasks.

Cross-domain shifts present a significant challenge for decision transformer (DT) policies. Existing cross-domain policy adaptation methods typically rely on a single simple filtering criterion to select source trajectory fragments and stitch them together. They match either state structure or action feasibility. However, the selected fragments still have poor stitchability: state structures can misalign, the return-to-go (RTG) becomes incomparable when the reward or horizon changes, and actions may jump at trajectory junctions. As a result, RTG tokens lose continuity, which compromises DT's inference ability. To tackle these challenges, we propose Data Fusion-Enhanced Decision Transformer (DFDT), a compact pipeline that restores stitchability. Particularly, DFDT fuses scarce target data with selectively trusted source fragments via a two-level data filter, maximum mean discrepancy (MMD) mismatch for state-structure alignment, and optimal transport (OT) deviation for action feasibility. It then trains on a feasibility-weighted fusion distribution. Furthermore, DFDT replaces RTG tokens with advantage-conditioned tokens, which improves the continuity of the semantics in the token sequence. It also applies a $Q$-guided regularizer to suppress junction value and action jumps. Theoretically, we provide bounds that tie state value and policy performance gaps to the MMD-mismatch and OT-deviation measures, and show that the bounds tighten as these two measures shrink. We show that DFDT improves return and stability over strong offline RL and sequence-model baselines across gravity, kinematic, and morphology shifts on D4RL-style control tasks, and further corroborate these gains with token-stitching and sequence-semantics stability analyses.

View on arXiv PDF

Similar