LG AIJul 12, 2022

DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization

Wentse Chen, Shiyu Huang, Yuan Chiang, Tim Pearce, Wei-Wei Tu, Ting Chen, Jun Zhu

arXiv:2207.05631v312.410 citationsh-index: 19Has Code

Originality Incremental advance

AI Analysis

This addresses the need for diverse solutions in RL to enhance user engagement and policy robustness, representing an incremental improvement over prior methods.

The paper tackles the problem of learning a single optimal strategy in reinforcement learning by proposing DGPO, an algorithm that discovers multiple diverse strategies for a given task, achieving comparable rewards and more diversity than baselines with better sample efficiency.

Most reinforcement learning algorithms seek a single optimal strategy that solves a given task. However, it can often be valuable to learn a diverse set of solutions, for instance, to make an agent's interaction with users more engaging, or improve the robustness of a policy to an unexpected perturbance. We propose Diversity-Guided Policy Optimization (DGPO), an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Specifically, we design an intrinsic reward based on an information-theoretic diversity objective. Our final objective alternately constraints on the diversity of the strategies and on the extrinsic reward. We solve the constrained optimization problem by casting it as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks. Compared to baseline methods, DGPO achieves comparable rewards, while discovering more diverse strategies, and often with better sample efficiency.

View on arXiv PDF Code

Similar