LGAIMay 10, 2024

Contrastive Representation for Data Filtering in Cross-Domain Offline Reinforcement Learning

arXiv:2405.06192v116 citationsh-index: 17ICML
Originality Highly original
AI Analysis

This addresses data efficiency and domain adaptation challenges in offline reinforcement learning for robotics and control applications, representing a strong specific gain.

The paper tackles the problem of performance degradation in cross-domain offline reinforcement learning due to dynamics mismatch by proposing a contrastive representation-based approach to measure domain gaps and a data filtering algorithm. It achieves 89.2% of the performance on full target data using only 10% of target data, outperforming state-of-the-art methods.

Cross-domain offline reinforcement learning leverages source domain data with diverse transition dynamics to alleviate the data requirement for the target domain. However, simply merging the data of two domains leads to performance degradation due to the dynamics mismatch. Existing methods address this problem by measuring the dynamics gap via domain classifiers while relying on the assumptions of the transferability of paired domains. In this paper, we propose a novel representation-based approach to measure the domain gap, where the representation is learned through a contrastive objective by sampling transitions from different domains. We show that such an objective recovers the mutual-information gap of transition functions in two domains without suffering from the unbounded issue of the dynamics gap in handling significantly different domains. Based on the representations, we introduce a data filtering algorithm that selectively shares transitions from the source domain according to the contrastive score functions. Empirical results on various tasks demonstrate that our method achieves superior performance, using only 10% of the target data to achieve 89.2% of the performance on 100% target dataset with state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes