LGMay 28, 2023

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

arXiv:2305.17625v236 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of online dynamics adaptation for robots and agents when source data is abundant but target interactions are limited, representing an incremental improvement over prior methods focused on dynamics discrepancy.

The paper tackles the challenge of generalizing reinforcement learning policies across domains with dynamics mismatch, such as from simulation to real-world, by introducing the Value-Guided Data Filtering (VGDF) algorithm that selectively shares source domain transitions based on value consistency, achieving superior performance in various environments with kinematic and morphology shifts.

Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes