ML LGMay 5, 2016

Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm

Junpei Komiyama, Junya Honda, Hiroshi Nakagawa

arXiv:1605.01677v211.441 citations

Originality Highly original

AI Analysis

This work addresses the challenge of efficiently identifying top-performing arms in dueling bandits, which is incremental but improves computational feasibility for applications like recommendation systems.

The paper tackles the Copeland dueling bandit problem by deriving a regret lower bound and proposing CW-RMED, an asymptotically optimal algorithm, and ECW-RMED, an efficient version that outperforms existing algorithms in experiments.

We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest number of other arms, is characterized by deriving an asymptotic regret bound. We propose Copeland Winners Relative Minimum Empirical Divergence (CW-RMED) and derive an asymptotically optimal regret bound for it. However, it is not known whether the algorithm can be efficiently computed or not. To address this issue, we devise an efficient version (ECW-RMED) and derive its asymptotic regret bound. Experimental comparisons of dueling bandit algorithms show that ECW-RMED significantly outperforms existing ones.

View on arXiv PDF

Similar