LG AI MLJul 8, 2019

General non-linear Bellman equations

Hado van Hasselt, John Quan, Matteo Hessel, Zhongwen Xu, Diana Borsa, Andre Barreto

arXiv:1907.03687v18.614 citations

Originality Incremental advance

AI Analysis

This work addresses a foundational problem in reinforcement learning and decision-making, offering incremental theoretical extensions with potential applications in modeling human behavior and algorithm design.

The authors tackled the problem of generalizing Bellman equations to non-linear forms, enabling a broader design space for algorithms with potential advantages in modeling natural phenomena and improving performance. They demonstrated that many resulting Bellman operators converge to a fixed point, inheriting beneficial properties from linear counterparts.

We consider a general class of non-linear Bellman equations. These open up a design space of algorithms that have interesting properties, which has two potential advantages. First, we can perhaps better model natural phenomena. For instance, hyperbolic discounting has been proposed as a mathematical model that matches human and animal data well, and can therefore be used to explain preference orderings. We present a different mathematical model that matches the same data, but that makes very different predictions under other circumstances. Second, the larger design space can perhaps lead to algorithms that perform better, similar to how discount factors are often used in practice even when the true objective is undiscounted. We show that many of the resulting Bellman operators still converge to a fixed point, and therefore that the resulting algorithms are reasonable and inherit many beneficial properties of their linear counterparts.

View on arXiv PDF

Similar