LGMLJun 24, 2024

Improved Regret Bounds for Bandits with Expert Advice

arXiv:2406.16802v12 citations
Originality Incremental advance
AI Analysis

This work addresses regret optimization in multi-armed bandit scenarios with expert advice, providing tighter bounds that are incremental improvements for researchers in online learning and decision-making.

The paper tackles the bandits with expert advice problem by proving a lower bound of order √(KT ln(N/K)) for worst-case regret under a restricted feedback model, matching a known upper bound and improving upon prior lower bounds, and introduces a new instance-based upper bound for the standard feedback model that offers logarithmic improvements over previous results.

In this research note, we revisit the bandits with expert advice problem. Under a restricted feedback model, we prove a lower bound of order $\sqrt{K T \ln(N/K)}$ for the worst-case regret, where $K$ is the number of actions, $N>K$ the number of experts, and $T$ the time horizon. This matches a previously known upper bound of the same order and improves upon the best available lower bound of $\sqrt{K T (\ln N) / (\ln K)}$. For the standard feedback model, we prove a new instance-based upper bound that depends on the agreement between the experts and provides a logarithmic improvement compared to prior results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes