AIJun 6, 2013

Direct Uncertainty Estimation in Reinforcement Learning

Sergey Rodionov, Alexey Potapov, Yurii Vinogradov

arXiv:1306.1553v21 citations

AI Analysis

This work addresses the exploration vs. exploitation problem in reinforcement learning for researchers and practitioners, but it appears incremental as it builds on existing uncertainty estimation approaches.

The paper tackles the computational infeasibility of optimal probabilistic reinforcement learning by proposing a method to directly measure uncertainty in the action-value function, analyzing its sufficiency as a more efficient alternative to propagating uncertainty through environment models.

Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertainty can be propagated to the action-value function via Bellman iterations, which are computationally insufficiently efficient though. We consider possibility of directly measuring uncertainty of the action-value function, and analyze sufficiency of this facilitated approach.

View on arXiv PDF

Similar