LG AIApr 20, 2015

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

Reinaldo Uribe Muriel, Fernando Lozando, Charles Anderson

arXiv:1504.05122v11.1

Originality Incremental advance

AI Analysis

This addresses a specific challenge in reinforcement learning for decision-making processes with average rewards, presenting an incremental improvement over existing methods.

The paper tackles the problem of solving average-reward semi-Markov decision processes by introducing optimal nudging, which reduces them to a minimal sequence of cumulative reward tasks, resulting in a small total number of tasks that need to be solved.

This paper describes a novel method to solve average-reward semi-Markov decision processes, by reducing them to a minimal sequence of cumulative reward problems. The usual solution methods for this type of problems update the gain (optimal average reward) immediately after observing the result of taking an action. The alternative introduced, optimal nudging, relies instead on setting the gain to some fixed value, which transitorily makes the problem a cumulative-reward task, solving it by any standard reinforcement learning method, and only then updating the gain in a way that minimizes uncertainty in a minmax sense. The rule for optimal gain update is derived by exploiting the geometric features of the w-l space, a simple mapping of the space of policies. The total number of cumulative reward tasks that need to be solved is shown to be small. Some experiments are presented to explore the features of the algorithm and to compare its performance with other approaches.

View on arXiv PDF

Similar