LGMLOct 16, 2019

Conditional Importance Sampling for Off-Policy Learning

arXiv:1910.07479v221 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving off-policy learning methods for reinforcement learning practitioners, though it appears incremental as it builds on existing importance sampling techniques.

The paper tackles the problem of off-policy reinforcement learning by introducing a conceptual framework based on conditional expectations of importance sampling ratios, which provides new insights into existing algorithms and reveals a broad space of unexplored algorithms.

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes