LGJun 4, 2021

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

arXiv:2106.02436v145 citations
AI Analysis

This addresses the challenge of handling unrestricted delay distributions in bandit algorithms for applications like online advertising or recommendation systems, representing a significant but incremental advance over prior work with restrictive assumptions.

The paper tackled the stochastic Multi-Armed Bandit problem with random feedback delays, developing algorithms that achieve near-optimal regret in both reward-dependent and reward-independent delay settings, with additive dependence on delay distribution quantiles.

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for infinite delays where the algorithm might occasionally not observe any feedback.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes