ROAug 27, 2024

Continual Domain Randomization

arXiv:2403.1219313 citationsh-index: 35
Originality Incremental advance
AI Analysis

For robotics practitioners using sim2real transfer, CDR offers a more flexible training process that reduces task difficulty by avoiding simultaneous randomization of many parameters.

Continual Domain Randomization (CDR) combines domain randomization with continual learning to train RL policies sequentially on subsets of randomization parameters, improving sim2real transfer. In robotic reaching and grasping tasks, CDR matches or outperforms baselines with combined or sequential randomization, achieving robust real-world performance.

Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of tunable parameters from the start of the training, from which the parameters are randomized simultaneously to train a robust model for use in the real world. However, the combined randomization of many parameters increases the task difficulty and might result in sub-optimal policies. To address this problem and to provide a more flexible training process, we propose Continual Domain Randomization (CDR) for RL that combines domain randomization with continual learning to enable sequential training in simulation on a subset of randomization parameters at a time. Starting from a model trained in a non-randomized simulation where the task is easier to solve, the model is trained on a sequence of randomizations, and continual learning is employed to remember the effects of previous randomizations. Our robotic reaching and grasping tasks experiments show that the model trained in this fashion learns effectively in simulation and performs robustly on the real robot while matching or outperforming baselines that employ combined randomization or sequential randomization without continual learning. Our code and videos are available at https://continual-dr.github.io/.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes