LGAIMLMay 22, 2019

The Journey is the Reward: Unsupervised Learning of Influential Trajectories

arXiv:1905.09334v14 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of unsupervised learning for agents in diverse and sparse environments, offering an incremental improvement over previous empowerment-based approaches.

The paper tackles the problem of unsupervised exploration in sparse environments by proposing a model-free method that uses the full trajectory as an influence measure, rather than just a final state, and successfully applies it to settings with large action spaces where discovering meaningful action sequences is difficult.

Unsupervised exploration and representation learning become increasingly important when learning in diverse and sparse environments. The information-theoretic principle of empowerment formalizes an unsupervised exploration objective through an agent trying to maximize its influence on the future states of its environment. Previous approaches carry certain limitations in that they either do not employ closed-loop feedback or do not have an internal state. As a consequence, a privileged final state is taken as an influence measure, rather than the full trajectory. We provide a model-free method which takes into account the whole trajectory while still offering the benefits of option-based approaches. We successfully apply our approach to settings with large action spaces, where discovery of meaningful action sequences is particularly difficult.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes