LG AI ROMar 24, 2025

Evolutionary Policy Optimization

Jianren Wang, Yifan Su, Abhinav Gupta, Deepak Pathak

arXiv:2503.19037v34 citationsh-index: 4

Originality Highly original

AI Analysis

This addresses scalability issues in reinforcement learning for robotics and control applications, offering a novel hybrid approach that is not purely incremental.

The paper tackles the problem of scaling on-policy reinforcement learning with larger batch sizes by proposing Evolutionary Policy Optimization (EPO), a hybrid algorithm that combines evolutionary algorithms and policy gradients, resulting in improved sample efficiency, asymptotic performance, and scalability across tasks like dexterous manipulation and legged locomotion.

On-policy reinforcement learning (RL) algorithms are widely used for their strong asymptotic performance and training stability, but they struggle to scale with larger batch sizes, as additional parallel environments yield redundant data due to limited policy-induced diversity. In contrast, Evolutionary Algorithms (EAs) scale naturally and encourage exploration via randomized population-based search, but are often sample-inefficient. We propose Evolutionary Policy Optimization (EPO), a hybrid algorithm that combines the scalability and diversity of EAs with the performance and stability of policy gradients. EPO maintains a population of agents conditioned on latent variables, shares actor-critic network parameters for coherence and memory efficiency, and aggregates diverse experiences into a master agent. Across tasks in dexterous manipulation, legged locomotion, and classic control, EPO outperforms state-of-the-art baselines in sample efficiency, asymptotic performance, and scalability.

View on arXiv PDF

Similar