LGMLNov 5, 2020

Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

arXiv:2011.02614v112 citations
AI Analysis

This work addresses the challenge of balancing performance and diversity in reinforcement learning agents, which is incremental as it builds on existing kernel-based methods.

The paper tackles the problem of training a population of reinforcement learning agents that are both high-performing and behaviorally diverse by leveraging distribution ratio estimators to compute gradients for policies in an ensemble, resulting in improved quality and diversity.

Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes