LGAIMLJan 4, 2019

Accelerating Goal-Directed Reinforcement Learning by Model Characterization

arXiv:1901.01977v13 citations
Originality Incremental advance
AI Analysis

This work addresses sample efficiency for reinforcement learning practitioners, though it appears incremental as it builds on existing techniques with modifications.

The paper tackles sample inefficiency in goal-directed reinforcement learning by developing a hybrid approach that combines model-free and model-based methods using Mean First Passage Times for reachability analysis, resulting in algorithms that converge with significantly fewer iterations, samples, and training trials than state-of-the-art counterparts.

We propose a hybrid approach aimed at improving the sample efficiency in goal-directed reinforcement learning. We do this via a two-step mechanism where firstly, we approximate a model from Model-Free reinforcement learning. Then, we leverage this approximate model along with a notion of reachability using Mean First Passage Times to perform Model-Based reinforcement learning. Built on such a novel observation, we design two new algorithms - Mean First Passage Time based Q-Learning (MFPT-Q) and Mean First Passage Time based DYNA (MFPT-DYNA), that have been fundamentally modified from the state-of-the-art reinforcement learning techniques. Preliminary results have shown that our hybrid approaches converge with much fewer iterations than their corresponding state-of-the-art counterparts and therefore requiring much fewer samples and much fewer training trials to converge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes