LG MLMar 5, 2017

Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications

arXiv:1703.01610v5105 citations

Originality Highly original

AI Analysis

This work addresses a critical theoretical limitation in bandit algorithms for applications such as online advertising and recommendation systems, though it is incremental in improving existing frameworks.

The paper tackled the issue of exponentially large regret bounds in combinatorial multi-armed bandits with probabilistically triggered arms by introducing a triggering probability modulated bounded smoothness condition, achieving significantly better regret bounds for applications like influence maximization and cascading bandits.

We study combinatorial multi-armed bandit with probabilistically triggered arms (CMAB-T) and semi-bandit feedback. We resolve a serious issue in the prior CMAB-T studies where the regret bounds contain a possibly exponentially large factor of $1/p^*$, where $p^*$ is the minimum positive probability that an arm is triggered by any action. We address this issue by introducing a triggering probability modulated (TPM) bounded smoothness condition into the general CMAB-T framework, and show that many applications such as influence maximization bandit and combinatorial cascading bandit satisfy this TPM condition. As a result, we completely remove the factor of $1/p^*$ from the regret bounds, achieving significantly better regret bounds for influence maximization and cascading bandits than before. Finally, we provide lower bound results showing that the factor $1/p^*$ is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.

View on arXiv PDF

Similar