Q-Star Meets Scalable Posterior Sampling: Bridging Theory and Practice via HyperAgent
This work addresses the challenge of scalable posterior sampling for exploration in RL, offering a practical algorithm with theoretical guarantees, though it appears incremental as it builds on existing hypermodel frameworks and deep RL methods.
The authors tackled the problem of efficient exploration in reinforcement learning by proposing HyperAgent, a hypermodel-based algorithm that approximates posterior samples of the optimal Q-function without conjugacy constraints, achieving logarithmic computational complexity and sublinear regret in tabular settings, and demonstrating robust performance in large-scale benchmarks like Deep Sea and Atari with significant efficiency gains.
We propose HyperAgent, a reinforcement learning (RL) algorithm based on the hypermodel framework for exploration in RL. HyperAgent allows for the efficient incremental approximation of posteriors associated with an optimal action-value function ($Q^\star$) without the need for conjugacy and follows the greedy policies w.r.t. these approximate posterior samples. We demonstrate that HyperAgent offers robust performance in large-scale deep RL benchmarks. It can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in the Atari suite. Implementing HyperAgent requires minimal code addition to well-established deep RL frameworks like DQN. We theoretically prove that, under tabular assumptions, HyperAgent achieves logarithmic per-step computational complexity while attaining sublinear regret, matching the best known randomized tabular RL algorithm.