LG AI MLJul 13, 2019

Parameterized Exploration

arXiv:1907.06090v11.0

Originality Incremental advance

AI Analysis

This work addresses exploration tuning for decision-making agents, but it is incremental as it builds on existing exploration techniques with model-based adjustments.

The authors tackled the problem of tuning exploration schedules in sequential decision problems by introducing Parameterized Exploration (PE), a model-based method that accounts for time horizon and agent knowledge, and demonstrated superior performance in bandits and an MDP compared to untuned counterparts.

We introduce Parameterized Exploration (PE), a simple family of methods for model-based tuning of the exploration schedule in sequential decision problems. Unlike common heuristics for exploration, our method accounts for the time horizon of the decision problem as well as the agent's current state of knowledge of the dynamics of the decision problem. We show our method as applied to several common exploration techniques has superior performance relative to un-tuned counterparts in Bernoulli and Gaussian multi-armed bandits, contextual bandits, and a Markov decision process based on a mobile health (mHealth) study. We also examine the effects of the accuracy of the estimated dynamics model on the performance of PE.

View on arXiv PDF

Similar