Learning Robust Controllers Via Probabilistic Model-Based Policy Search
This work addresses robustness in controllers for reinforcement learning applications, but it is incremental as it builds on the PILCO algorithm with a specific regularization technique.
The paper tackled the problem of learning robust controllers in model-based reinforcement learning by investigating whether controllers can generalize under small environmental perturbations. The result showed that enforcing a lower bound on likelihood noise in a Gaussian Process dynamics model regularizes policy updates and yields more robust controllers, with empirical benefits demonstrated in a simulation benchmark.
Model-based Reinforcement Learning estimates the true environment through a world model in order to approximate the optimal policy. This family of algorithms usually benefits from better sample efficiency than their model-free counterparts. We investigate whether controllers learned in such a way are robust and able to generalize under small perturbations of the environment. Our work is inspired by the PILCO algorithm, a method for probabilistic policy search. We show that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers. We demonstrate the empirical benefits of our method in a simulation benchmark.