On Convergence of Gradient Expected Sarsa($λ$)
This addresses convergence issues in reinforcement learning algorithms for researchers and practitioners, but it is incremental as it builds on existing gradient temporal difference methods.
The paper tackles the instability of Expected Sarsa(λ) with linear function approximation in off-policy learning by proposing a convergent Gradient Expected Sarsa(λ) algorithm, which achieves linear convergence to the optimal solution and is verified through experiments.
We study the convergence of $\mathtt{Expected~Sarsa}(λ)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(λ)$ is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent $\mathtt{Gradient~Expected~Sarsa}(λ)$ ($\mathtt{GES}(λ)$) algorithm. The theoretical analysis shows that our $\mathtt{GES}(λ)$ converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of $\mathtt{GES}(λ)$, such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally, we conduct experiments to verify the effectiveness of our $\mathtt{GES}(λ)$.