Recovering Robustness in Model-Free Reinforcement learning
For practitioners using model-free RL in control systems, this paper identifies a robustness issue and offers a simple fix, though it is demonstrated only on small examples.
This paper shows that model-free RL with partial observations can produce controllers with poor robustness margins, and proposes adding random input perturbations during training to recover robustness, enabling a trade-off between performance and robustness.
Reinforcement learning (RL) is used to directly design a control policy using data collected from the system. This paper considers the robustness of controllers trained via model-free RL. The discussion focuses on the standard model-based linear quadratic Gaussian (LQG) problem as a special instance of RL. A simple example, originally formulated for LQG problems, is used to demonstrate that RL with partial observations can lead to poor robustness margins. It is proposed to recover robustness by introducing random perturbations at the system input during the RL training. The perturbation magnitude can be used to trade off performance for robustness. Two simple examples are presented to demonstrate the proposed method for enhancing robustness during RL training.