LGSYMLOct 11, 2019

Zap Q-Learning With Nonlinear Function Approximation

arXiv:1910.05405v225 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of slow convergence in reinforcement learning for practitioners, though it is incremental as it builds on existing Zap Q-learning methods.

The paper tackled the problem of accelerating convergence in Zap Q-learning by extending stability theory beyond restrictive settings, showing consistency with nonlinear function approximation and achieving quick convergence in tests on OpenAI Gym examples.

Zap Q-learning is a recent class of reinforcement learning algorithms, motivated primarily as a means to accelerate convergence. Stability theory has been absent outside of two restrictive classes: the tabular setting, and optimal stopping. This paper introduces a new framework for analysis of a more general class of recursive algorithms known as stochastic approximation. Based on this general theory, it is shown that Zap Q-learning is consistent under a non-degeneracy assumption, even when the function approximation architecture is nonlinear. Zap Q-learning with neural network function approximation emerges as a special case, and is tested on examples from OpenAI Gym. Based on multiple experiments with a range of neural network sizes, it is found that the new algorithms converge quickly and are robust to choice of function approximation architecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes