LGAIFeb 27

Bridging Dynamics Gaps via Diffusion Schrödinger Bridge for Cross-Domain Reinforcement Learning

Hanping Zhang, Yuhong Guo
arXiv:2602.23737v1
Originality Incremental advance
AI Analysis

This addresses the problem of learning transferable policies under dynamics shifts for reinforcement learning practitioners, but it is incremental as it builds on existing methods like DSB.

The paper tackles cross-domain reinforcement learning by proposing BDGxRL, a framework that uses Diffusion Schrödinger Bridge to align source transitions with target-domain dynamics from offline demonstrations, achieving state-of-the-art performance on MuJoCo benchmarks with strong adaptability under dynamics shifts.

Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy learning. To address this challenge, we propose Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL), a novel framework that leverages Diffusion Schrödinger Bridge (DSB) to align source transitions with target-domain dynamics encoded in offline demonstrations. Moreover, we introduce a reward modulation mechanism that estimates rewards based on state transitions, applying to DSB-aligned samples to ensure consistency between rewards and target-domain dynamics. BDGxRL performs target-oriented policy learning entirely within the source domain, without access to the target environment or its rewards. Experiments on MuJoCo cross-domain benchmarks demonstrate that BDGxRL outperforms state-of-the-art baselines and shows strong adaptability under transition dynamics shifts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes