LGJul 10, 2015

Utility-based Dueling Bandits as a Partial Monitoring Game

arXiv:1507.02750v27 citations
AI Analysis

This work provides a theoretical foundation for solving dueling bandits efficiently, which is incremental as it connects existing partial monitoring algorithms to this specific problem.

The paper tackles the utility-based dueling bandit problem by framing it as a partial monitoring game, proving it is an easy instance with a regret bound of Theta(sqrt{T}).

Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label efficient prediction. We study the utility-based dueling bandit problem as an instance of partial monitoring problem and prove that it fits the time-regret partial monitoring hierarchy as an easy - i.e. Theta (sqrt{T})- instance. We survey some partial monitoring algorithms and see how they could be used to solve dueling bandits efficiently. Keywords: Online learning, Dueling Bandits, Partial Monitoring, Partial Feedback, Multiarmed Bandits

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes