LG MLOct 16, 2019

Conditional Importance Sampling for Off-Policy Learning

Mark Rowland, Anna Harutyunyan, Hado van Hasselt, Diana Borsa, Tom Schaul, Rémi Munos, Will Dabney

arXiv:1910.07479v212.821 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving off-policy learning methods for reinforcement learning practitioners, though it appears incremental as it builds on existing importance sampling techniques.

The paper tackles the problem of off-policy reinforcement learning by introducing a conceptual framework based on conditional expectations of importance sampling ratios, which provides new insights into existing algorithms and reveals a broad space of unexplored algorithms.

The principal contribution of this paper is a conceptual framework for off-policy reinforcement learning, based on conditional expectations of importance sampling ratios. This framework yields new perspectives and understanding of existing off-policy algorithms, and reveals a broad space of unexplored algorithms. We theoretically analyse this space, and concretely investigate several algorithms that arise from this framework.

View on arXiv PDF

Similar