LGAIMLSep 13, 2019

Reinforcement Learning: a Comparison of UCB Versus Alternative Adaptive Policies

arXiv:1909.06019v11 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of adaptive policy selection in reinforcement learning for researchers and practitioners, but it is incremental as it builds on existing methods with comparative analysis.

The paper compares the performance of the classic UCB policy with two new policies, MDP-DMED and MDP-PS, for reinforcement learning in Markov decision processes with unknown transition probabilities, finding that the new policies offer competitive or improved results in specific scenarios.

In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of \cc{bkmdp97} with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes