LGAIDec 12, 2022

Variance-Reduced Conservative Policy Iteration

DeepMind
arXiv:2212.06283v23 citationsh-index: 20
AI Analysis

This work addresses sample efficiency in reinforcement learning for practitioners, offering incremental improvements in theoretical bounds.

The paper tackles the sample complexity of reinforcement learning by proposing a variance-reduced variant of Conservative Policy Iteration, which improves the sample complexity for achieving an ε-functional local optimum from O(ε⁻⁴) to O(ε⁻³) and for global optimality from O(ε⁻³) to O(ε⁻²) under certain assumptions.

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes