LGJan 6, 2021

Smoothed functional-based gradient algorithms for off-policy reinforcement learning: A non-asymptotic viewpoint

arXiv:2101.02137v67 citations
Originality Incremental advance
AI Analysis

This work provides non-asymptotic convergence guarantees for off-policy policy gradient methods, which is important for researchers and practitioners developing more stable and efficient RL algorithms.

This paper proposes two off-policy reinforcement learning algorithms that use smoothed functional (SF) based gradient estimation. The first algorithm combines importance sampling with SF, achieving a convergence rate comparable to REINFORCE, while the second algorithm incorporates variance reduction (inspired by SVRG) and demonstrates an improved rate of convergence.

We propose two policy gradient algorithms for solving the problem of control in an off-policy reinforcement learning (RL) context. Both algorithms incorporate a smoothed functional (SF) based gradient estimation scheme. The first algorithm is a straightforward combination of importance sampling-based off-policy evaluation with SF-based gradient estimation. The second algorithm, inspired by the stochastic variance-reduced gradient (SVRG) algorithm, incorporates variance reduction in the update iteration. For both algorithms, we derive non-asymptotic bounds that establish convergence to an approximate stationary point. From these results, we infer that the first algorithm converges at a rate that is comparable to the well-known REINFORCE algorithm in an off-policy RL context, while the second algorithm exhibits an improved rate of convergence.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes