SEAISYFeb 28, 2017

Stacked Thompson Bandits

arXiv:1702.08726v13 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of plan generation under temporal logic constraints, likely for robotics or AI planning domains, and appears incremental as it builds on existing bandit and Thompson sampling methods.

The paper tackles the problem of efficiently generating plans that satisfy bounded temporal logic requirements by introducing Stacked Thompson Bandits (STB), which uses a Bayesian approach with Thompson sampling to achieve high probability satisfaction while searching only a fraction of the search space.

We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes