LGAIMLMay 29, 2018

Depth and nonlinearity induce implicit exploration for RL

arXiv:1805.11711v12 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of exploration in RL for practitioners by offering a deterministic alternative to stochastic methods, though it appears incremental as it builds on existing Q-learning frameworks.

The paper tackles the problem of exploration in reinforcement learning by showing that Q-learning with a nonlinear Q-function and a purely greedy policy can match or exceed the performance of ε-greedy exploration on standard benchmarks like mountain car, with specific improvements noted in learning efficiency.

The question of how to explore, i.e., take actions with uncertain outcomes to learn about possible future rewards, is a key question in reinforcement learning (RL). Here, we show a surprising result: We show that Q-learning with nonlinear Q-function and no explicit exploration (i.e., a purely greedy policy) can learn several standard benchmark tasks, including mountain car, equally well as, or better than, the most commonly-used $ε$-greedy exploration. We carefully examine this result and show that both the depth of the Q-network and the type of nonlinearity are important to induce such deterministic exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes