LGMLJun 18, 2019

Simple Algorithms for Dueling Bandits

arXiv:1906.07611v16 citations
Originality Incremental advance
AI Analysis

This addresses the problem of efficient decision-making with binary feedback in comparison-based settings, such as ranking or recommendation systems, and is incremental as it builds on existing Dueling Bandits methods.

The paper tackles the Dueling Bandits problem by proposing simple algorithms with regret bounds of order O(T^rho) where 1/2 <= rho <= 3/4, independent of the preference gap Delta, and shows that in some cases, these algorithms exceed state-of-the-art performance in experiments on synthetic data.

In this paper, we present simple algorithms for Dueling Bandits. We prove that the algorithms have regret bounds for time horizon T of order O(T^rho ) with 1/2 <= rho <= 3/4, which importantly do not depend on any preference gap between actions, Delta. Dueling Bandits is an important extension of the Multi-Armed Bandit problem, in which the algorithm must select two actions at a time and only receives binary feedback for the duel outcome. This is analogous to comparisons in which the rater can only provide yes/no or better/worse type responses. We compare our simple algorithms to the current state-of-the-art for Dueling Bandits, ISS and DTS, discussing complexity and regret upper bounds, and conducting experiments on synthetic data that demonstrate their regret performance, which in some cases exceeds state-of-the-art.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes