LGJan 15, 2016

A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

Pratik Gajane, Tanguy Urvoy, Fabrice Clérot

arXiv:1601.03855v113.753 citations

Originality Incremental advance

AI Analysis

This work addresses a variation of the multi-armed bandit problem for applications like information retrieval, but it is incremental as it extends an existing algorithm.

The paper tackled the adversarial utility-based dueling bandit problem by proposing the REX3 algorithm, achieving an expected regret upper bound of O(sqrt(K ln(K)T)) and demonstrating experimental results with real data.

We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.

View on arXiv PDF

Similar