Domain Adversarial Reinforcement Learning
This addresses the challenge of visual domain shift in RL for agents needing to adapt to new environments without retraining, though it is incremental as it builds on existing adversarial methods.
The paper tackles the problem of generalization in reinforcement learning when visual aspects like backgrounds or lighting differ, by enforcing invariance of learned representations through domain adversarial optimization, achieving significant improvement in zero-shot performance on unseen domains.
We consider the problem of generalization in reinforcement learning where visual aspects of the observations might differ, e.g. when there are different backgrounds or change in contrast, brightness, etc. We assume that our agent has access to only a few of the MDPs from the MDP distribution during training. The performance of the agent is then reported on new unknown test domains drawn from the distribution (e.g. unseen backgrounds). For this "zero-shot RL" task, we enforce invariance of the learned representations to visual domains via a domain adversarial optimization process. We empirically show that this approach allows achieving a significant generalization improvement to new unseen domains.