Robust Reinforcement Learning under model misspecification
This addresses robustness issues for RL applications in real-world control, but appears incremental as it builds on existing methods like POMDPs and adversarial training.
The paper tackles the problem of model misspecification in reinforcement learning, where agents face different transition dynamics during training and deployment, and proposes a framework using history trajectory and POMDP modeling with adversarial attacks, achieving validated effectiveness in four gym domains.
Reinforcement learning has achieved remarkable performance in a wide range of tasks these days. Nevertheless, some unsolved problems limit its applications in real-world control. One of them is model misspecification, a situation where an agent is trained and deployed in environments with different transition dynamics. We propose an novel framework that utilize history trajectory and Partial Observable Markov Decision Process Modeling to deal with this dilemma. Additionally, we put forward an efficient adversarial attack method to assist robust training. Our experiments in four gym domains validate the effectiveness of our framework.