LGMar 12

Cross-Domain Policy Optimization via Bellman Consistency and Hybrid Critics

Ming-Hong Chen, Kuan-Chen Pan, You-De Huang, Xi Liu, Ping-Chun Hsieh

arXiv:2603.12087v15.0h-index: 1

Predicted impact top 50% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This addresses data efficiency challenges in RL for domains with distinct state or action spaces, though it appears incremental as it builds on existing transfer methods.

The paper tackles the problem of cross-domain reinforcement learning by proposing QAvatar, a method that uses cross-domain Bellman consistency and hybrid critics to measure transferability and combine Q functions, achieving reliable transfer across locomotion and robot manipulation tasks.

Cross-domain reinforcement learning (CDRL) is meant to improve the data efficiency of RL by leveraging the data samples collected from a source domain to facilitate the learning in a similar target domain. Despite its potential, cross-domain transfer in RL is known to have two fundamental and intertwined challenges: (i) The source and target domains can have distinct state space or action space, and this makes direct transfer infeasible and thereby requires more sophisticated inter-domain mappings; (ii) The transferability of a source-domain model in RL is not easily identifiable a priori, and hence CDRL can be prone to negative effect during transfer. In this paper, we propose to jointly tackle these two challenges through the lens of \textit{cross-domain Bellman consistency} and \textit{hybrid critic}. Specifically, we first introduce the notion of cross-domain Bellman consistency as a way to measure transferability of a source-domain model. Then, we propose $Q$Avatar, which combines the Q functions from both the source and target domains with an adaptive hyperparameter-free weight function. Through this design, we characterize the convergence behavior of $Q$Avatar and show that $Q$Avatar achieves reliable transfer in the sense that it effectively leverages a source-domain Q function for knowledge transfer to the target domain. Through experiments, we demonstrate that $Q$Avatar achieves favorable transferability across various RL benchmark tasks, including locomotion and robot arm manipulation. Our code is available at https://rl-bandits-lab.github.io/Cross-Domain-RL/.

View on arXiv PDF

Similar