LGAIROApr 26, 2022

Learning Value Functions from Undirected State-only Experience

arXiv:2204.12458v19 citationsh-index: 19
Originality Incremental advance
AI Analysis

This addresses the problem of reinforcement learning from limited data for researchers and practitioners, but it is incremental as it builds on existing Q-learning and latent-variable methods.

The paper tackles learning value functions from undirected state-only experience (state transitions without action labels) by proposing Latent Action Q-learning (LAQ), which uses Q-learning on discrete latent actions from a prediction model. Experiments in 5 environments, including 3D visual navigation, show LAQ recovers value functions with high correlation to ground truth and enables sample-efficient goal-directed behavior.

This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ lead to sample efficient acquisition of goal-directed behavior, can be used with domain-specific low-level controllers, and facilitate transfer across embodiments. Our experiments in 5 environments ranging from 2D grid world to 3D visual navigation in realistic environments demonstrate the benefits of LAQ over simpler alternatives, imitation learning oracles, and competing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes