MLLGJun 8, 2015

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

arXiv:1506.02550v3110 citations
Originality Highly original
AI Analysis

This work addresses the dueling bandit problem, which is incremental but provides the first algorithm with a regret upper bound matching the lower bound, improving performance for bandit learning applications.

The authors tackled the K-armed dueling bandit problem by deriving a tight asymptotic regret lower bound and proposing an algorithm that matches this bound, achieving significant experimental outperformance over existing methods.

We study the $K$-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes