LGMLFeb 15, 2020

Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning

arXiv:2002.06473v113 citations
AI Analysis

This addresses the challenge of sample efficiency and bias in goal-reaching tasks for AI agents, though it appears incremental as it builds on existing density estimation methods.

The paper tackles the problem of enabling agents to reliably reach specified states in imitation learning and goal-conditioned reinforcement learning by connecting probabilistic long-term dynamics to value functions, using density estimation. It shows the approach is efficient, avoids hindsight bias in stochastic domains, and achieves state-of-the-art demonstration sample-efficiency on benchmarks.

This work considers two distinct settings: imitation learning and goal-conditioned reinforcement learning. In either case, effective solutions require the agent to reliably reach a specified state (a goal), or set of states (a demonstration). Drawing a connection between probabilistic long-term dynamics and the desired value function, this work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state. As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in stochastic domains. As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes