ML LGJun 8, 2015

Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem

Junpei Komiyama, Junya Honda, Hisashi Kashima, Hiroshi Nakagawa

arXiv:1506.02550v326.0110 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the dueling bandit problem, which is incremental but provides the first algorithm with a regret upper bound matching the lower bound, improving performance for bandit learning applications.

The authors tackled the K-armed dueling bandit problem by deriving a tight asymptotic regret lower bound and proposing an algorithm that matches this bound, achieving significant experimental outperformance over existing methods.

We study the $K$-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.

View on arXiv PDF Code

Similar