Robust Visual Domain Randomization for Reinforcement Learning
This work addresses the problem of inefficient and high-variance policies in visual domain randomization for reinforcement learning agents, offering an incremental improvement in training efficiency.
The paper tackles the challenge of generalization in reinforcement learning by proposing a regularization method that trains agents on a single environment variation while enforcing invariance in state representations, achieving equal generalization scores with more efficient and robust learning compared to standard domain randomization.
Producing agents that can generalize to a wide range of visually different environments is a significant challenge in reinforcement learning. One method for overcoming this issue is visual domain randomization, whereby at the start of each training episode some visual aspects of the environment are randomized so that the agent is exposed to many possible variations. However, domain randomization is highly inefficient and may lead to policies with high variance across domains. Instead, we propose a regularization method whereby the agent is only trained on one variation of the environment, and its learned state representations are regularized during training to be invariant across domains. We conduct experiments that demonstrate that our technique leads to more efficient and robust learning than standard domain randomization, while achieving equal generalization scores.