NEAIMay 10, 2023

Supplementing Gradient-Based Reinforcement Learning with Simple Evolutionary Ideas

arXiv:2305.07571v1
Originality Incremental advance
AI Analysis

This work addresses sample inefficiency in reinforcement learning for researchers and practitioners, offering an incremental improvement over existing methods.

The paper tackles the problem of sample inefficiency in reinforcement learning by introducing a hybrid algorithm that combines gradient-based training with sparse evolutionary operators, resulting in a method that is robust to hyperparameter variations and outperforms standard RL baselines, with the simple baseline of multiple agents sharing a common memory also showing improved performance.

We present a simple, sample-efficient algorithm for introducing large but directed learning steps in reinforcement learning (RL), through the use of evolutionary operators. The methodology uses a population of RL agents training with a common experience buffer, with occasional crossovers and mutations of the agents in order to search efficiently through the policy space. Unlike prior literature on combining evolutionary search (ES) with RL, this work does not generate a distribution of agents from a common mean and covariance matrix. Neither does it require the evaluation of the entire population of policies at every time step. Instead, we focus on gradient-based training throughout the life of every policy (individual), with a sparse amount of evolutionary exploration. The resulting algorithm is shown to be robust to hyperparameter variations. As a surprising corollary, we show that simply initialising and training multiple RL agents with a common memory (with no further evolutionary updates) outperforms several standard RL baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes