GT AI FL LONov 17, 2018

The Impatient May Use Limited Optimism to Minimize Regret

Michaël Cadilhac, Guillermo A. Pérez, Marie van den Bogaard

arXiv:1811.07146v11.24 citations

Originality Incremental advance

AI Analysis

This work addresses regret minimization in reinforcement learning for agents interacting with environments, offering computational tools but is incremental as it builds on existing game theory frameworks.

The paper tackles the problem of computing the minimum possible regret in discounted-sum games, which model reinforcement learning with early reward incentives, and presents a PSPACE algorithm for this computation. It also provides an NP algorithm for checking regret bounds and shows decidability in PSPACE for strategy minimization.

Discounted-sum games provide a formal model for the study of reinforcement learning, where the agent is enticed to get rewards early since later rewards are discounted. When the agent interacts with the environment, she may regret her actions, realizing that a previous choice was suboptimal given the behavior of the environment. The main contribution of this paper is a PSPACE algorithm for computing the minimum possible regret of a given game. To this end, several results of independent interest are shown. (1) We identify a class of regret-minimizing and admissible strategies that first assume that the environment is collaborating, then assume it is adversarial---the precise timing of the switch is key here. (2) Disregarding the computational cost of numerical analysis, we provide an NP algorithm that checks that the regret entailed by a given time-switching strategy exceeds a given value. (3) We show that determining whether a strategy minimizes regret is decidable in PSPACE.

View on arXiv PDF

Similar