Performance-Weighed Policy Sampling for Meta-Reinforcement Learning
This work provides an incremental improvement for researchers and practitioners in reinforcement learning and control systems, specifically for developing more adaptive and fault-tolerant control schemes for dynamic systems like aircraft fuel transfer systems.
This paper introduces Enhanced Model-Agnostic Meta-Learning (E-MAML), an algorithm designed to accelerate the convergence of policy functions in meta-reinforcement learning, particularly when adapting to new tasks with limited training examples. E-MAML achieves this by intelligently re-initializing new RL policy parameters using previously learned policy parameters from similar tasks, rather than random sampling, to facilitate faster adaptation to new faults in dynamic systems.
This paper discusses an Enhanced Model-Agnostic Meta-Learning (E-MAML) algorithm that generates fast convergence of the policy function from a small number of training examples when applied to new learning tasks. Built on top of Model-Agnostic Meta-Learning (MAML), E-MAML maintains a set of policy parameters learned in the environment for previous tasks. We apply E-MAML to developing reinforcement learning (RL)-based online fault tolerant control schemes for dynamic systems. The enhancement is applied when a new fault occurs, to re-initialize the parameters of a new RL policy that achieves faster adaption with a small number of samples of system behavior with the new fault. This replaces the random task sampling step in MAML. Instead, it exploits the extant previously generated experiences of the controller. The enhancement is sampled to maximally span the parameter space to facilitate adaption to the new fault. We demonstrate the performance of our approach combining E-MAML with proximal policy optimization (PPO) on the well-known cart pole example, and then on the fuel transfer system of an aircraft.