MLAILGDec 31, 2019

The Gambler's Problem and Beyond

arXiv:2001.00102v3
AI Analysis

This provides foundational insights for improving value function approximation and gradient-based algorithms in reinforcement learning, though it is incremental as it builds on an existing textbook example.

The paper tackles the Gambler's problem in reinforcement learning by deriving the exact formula for the optimal value function in discrete and continuous cases, revealing it as a pathological, fractal generalized Cantor function with derivative extremes.

We analyze the Gambler's problem, a simple reinforcement learning problem where the gambler has the chance to double or lose the bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton and Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating non-smooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous cases. Though simple as it might seem, the value function is pathological: fractal, self-similar, derivative taking either zero or infinity, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could provide insights into improving value function approximation, gradient-based algorithms, and Q-learning, in real applications and implementations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes