LGJan 15, 2016

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

arXiv:1601.03855v153 citations
Originality Incremental advance
AI Analysis

This work addresses a variation of the multi-armed bandit problem for applications like information retrieval, but it is incremental as it extends an existing algorithm.

The paper tackled the adversarial utility-based dueling bandit problem by proposing the REX3 algorithm, achieving an expected regret upper bound of O(sqrt(K ln(K)T)) and demonstrating experimental results with real data.

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes