Perceptual Values from Observation
This work addresses the challenge of learning from expert demonstrations without action information, such as videos, for researchers and practitioners in imitation and reinforcement learning, representing an incremental improvement over existing methods.
The paper tackles the problem of imitation learning from observation-only demonstrations by introducing a method that learns values directly from observations, which significantly speeds up reinforcement learning by eliminating the need for bootstrapping action-values compared to sparse-reward specifications.
Imitation by observation is an approach for learning from expert demonstrations that lack action information, such as videos. Recent approaches to this problem can be placed into two broad categories: training dynamics models that aim to predict the actions taken between states, and learning rewards or features for computing them for Reinforcement Learning (RL). In this paper, we introduce a novel approach that learns values, rather than rewards, directly from observations. We show that by using values, we can significantly speed up RL by removing the need to bootstrap action-values, as compared to sparse-reward specifications.