SYLGRONov 20, 2020

MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric Model Uncertainty

arXiv:2011.10562v112 citations
AI Analysis

This work tackles the critical problem of sim-to-real transfer for reinforcement learning policies, which is a significant challenge for robotics and control engineers.

This paper addresses the sim-to-real gap in reinforcement learning by proposing the MRAC-RL framework, which uses an inner-loop adaptive controller to enable simulation-trained policies to adapt to real-world parametric model uncertainty. The framework improves upon state-of-the-art RL algorithms for systems with modeling errors.

Reinforcement learning (RL) algorithms have been successfully used to develop control policies for dynamical systems. For many such systems, these policies are trained in a simulated environment. Due to discrepancies between the simulated model and the true system dynamics, RL trained policies often fail to generalize and adapt appropriately when deployed in the real-world environment. Current research in bridging this sim-to-real gap has largely focused on improvements in simulation design and on the development of improved and specialized RL algorithms for robust control policy generation. In this paper we apply principles from adaptive control and system identification to develop the model-reference adaptive control & reinforcement learning (MRAC-RL) framework. We propose a set of novel MRAC algorithms applicable to a broad range of linear and nonlinear systems, and derive the associated control laws. The MRAC-RL framework utilizes an inner-loop adaptive controller that allows a simulation-trained outer-loop policy to adapt and operate effectively in a test environment, even when parametric model uncertainty exists. We demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing control policies that can be applied to systems with modeling errors.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes