LGAIFeb 18, 2023

Approximate Thompson Sampling via Epistemic Neural Networks

Stanford
arXiv:2302.09205v130 citationsh-index: 55
Originality Highly original
AI Analysis

This addresses the problem of scaling Thompson sampling for practitioners in reinforcement learning and bandit settings, offering a computationally efficient solution.

The paper tackles the computational intractability of Thompson sampling in complex environments by using epistemic neural networks (ENNs) to approximate joint predictive distributions, resulting in effective action selection with the epinet variant matching large ensemble performance at much lower computational cost.

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling from a posterior distribution. Unfortunately, this can become computationally intractable in complex environments, such as those modeled using neural networks. Approximate posterior samples can produce effective actions, but only if they reasonably approximate joint predictive distributions of outputs across inputs. Notably, accuracy of marginal predictive distributions does not suffice. Epistemic neural networks (ENNs) are designed to produce accurate joint predictive distributions. We compare a range of ENNs through computational experiments that assess their performance in approximating TS across bandit and reinforcement learning environments. The results indicate that ENNs serve this purpose well and illustrate how the quality of joint predictive distributions drives performance. Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost. This enables effective application of TS with computation that scales gracefully to complex environments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes