Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments
This work addresses the need for scalable and efficient reinforcement learning methods that can quickly adapt to changing environments, though it is incremental as it builds on existing ES approaches.
The paper tackles the problem of adapting evolution strategies (ES) to dynamic environments in reinforcement learning by proposing an instance weighting mechanism that assigns higher weights to instances with more new knowledge, resulting in significantly improved performance on challenging RL tasks such as robot navigation and locomotion.
Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.