LGMar 30, 2022

Marginalized Operators for Off-policy Reinforcement Learning

arXiv:2203.16177v1
Originality Incremental advance
AI Analysis

This work addresses a key challenge in reinforcement learning for improving sample efficiency and stability in off-policy settings, though it appears incremental as it builds on prior methods like Retrace and marginalized importance sampling.

The paper tackles the problem of off-policy evaluation in reinforcement learning by proposing marginalized operators, which generalize existing multi-step operators and offer potential variance reduction, leading to performance gains in evaluation and policy optimization.

In this work, we propose marginalized operators, a new class of off-policy evaluation operators for reinforcement learning. Marginalized operators strictly generalize generic multi-step operators, such as Retrace, as special cases. Marginalized operators also suggest a form of sample-based estimates with potential variance reduction, compared to sample-based estimates of the original multi-step operators. We show that the estimates for marginalized operators can be computed in a scalable way, which also generalizes prior results on marginalized importance sampling as special cases. Finally, we empirically demonstrate that marginalized operators provide performance gains to off-policy evaluation and downstream policy optimization algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes