Trajectory First: A Curriculum for Discovering Diverse Policies
This work addresses the challenge of enhancing agent robustness and avoiding local optima in reinforcement learning, though it appears incremental as it builds on existing constrained-diversity frameworks.
The paper tackles the problem of limited policy diversity in constrained-diversity reinforcement learning methods, particularly in complex tasks like robotic manipulation, by proposing a curriculum that explores at the trajectory level first, resulting in improved diversity of learned skills.
Being able to solve a task in diverse ways makes agents more robust to task variations and less prone to local optima. In this context, constrained diversity optimization has emerged as a powerful reinforcement learning (RL) framework to train a diverse set of agents in parallel. However, existing constrained-diversity RL methods often under-explore in complex tasks such as robotic manipulation, leading to a lack in policy diversity. To improve diversity optimization in RL, we therefore propose a curriculum that first explores at the trajectory level before learning step-based policies. In our empirical evaluation, we provide novel insights into the shortcoming of skill-based diversity optimization, and demonstrate empirically that our curriculum improves the diversity of the learned skills.