LGMLSep 30, 2019

Meta-Q-Learning

arXiv:1910.00125v2164 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient adaptation in meta-RL for researchers, but it appears incremental as it builds on existing Q-learning and meta-training ideas.

The paper tackles meta-Reinforcement Learning by introducing Meta-Q-Learning (MQL), an off-policy algorithm that recycles past data for adaptation, and it shows competitive performance with state-of-the-art methods on continuous-control benchmarks.

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm for meta-Reinforcement Learning (meta-RL). MQL builds upon three simple ideas. First, we show that Q-learning is competitive with state-of-the-art meta-RL algorithms if given access to a context variable that is a representation of the past trajectory. Second, a multi-task objective to maximize the average reward across the training tasks is an effective method to meta-train RL policies. Third, past data from the meta-training replay buffer can be recycled to adapt the policy on a new task using off-policy updates. MQL draws upon ideas in propensity estimation to do so and thereby amplifies the amount of available data for adaptation. Experiments on standard continuous-control benchmarks suggest that MQL compares favorably with the state of the art in meta-RL.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes