LGAIMLDec 23, 2019

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

arXiv:1912.10577v23 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of high computational resource requirements for efficient exploration in reinforcement learning, offering a more efficient alternative.

The paper tackles the computational inefficiency of ensemble sampling for exploration in reinforcement learning by introducing a parameterized indexed value function to represent uncertainty, and demonstrates its efficacy through computational experiments.

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes