LGJan 28, 2022

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

arXiv:2201.11921v213.625 citations

Originality Highly original

AI Analysis

This work addresses robust decision-making under uncertainty for applications like finance or recommendation systems, offering novel algorithms that bridge stochastic and adversarial settings with adaptive optimality.

The paper tackles the problem of heavy-tailed multi-armed bandits in adversarial environments by developing algorithms that achieve optimal regret bounds without prior knowledge of heavy-tail parameters. The HTINF algorithm achieves best-of-both-worlds guarantees, and AdaTINF adapts to unknown parameters to match known lower bounds, with results like O(σK^(1-1/α)T^(1/α)) regret.

In this paper, we generalize the concept of heavy-tailed multi-armed bandits to adversarial environments, and develop robust best-of-both-worlds algorithms for heavy-tailed multi-armed bandits (MAB), where losses have $α$-th ($1<α\le 2$) moments bounded by $σ^α$, while the variances may not exist. Specifically, we design an algorithm \texttt{HTINF}, when the heavy-tail parameters $α$ and $σ$ are known to the agent, \texttt{HTINF} simultaneously achieves the optimal regret for both stochastic and adversarial environments, without knowing the actual environment type a-priori. When $α,σ$ are unknown, \texttt{HTINF} achieves a $\log T$-style instance-dependent regret in stochastic cases and $o(T)$ no-regret guarantee in adversarial cases. We further develop an algorithm \texttt{AdaTINF}, achieving $\mathcal O(σK^{1-\nicefrac 1α}T^{\nicefrac{1}α})$ minimax optimal regret even in adversarial settings, without prior knowledge on $α$ and $σ$. This result matches the known regret lower-bound (Bubeck et al., 2013), which assumed a stochastic environment and $α$ and $σ$ are both known. To our knowledge, the proposed \texttt{HTINF} algorithm is the first to enjoy a best-of-both-worlds regret guarantee, and \texttt{AdaTINF} is the first algorithm that can adapt to both $α$ and $σ$ to achieve optimal gap-indepedent regret bound in classical heavy-tailed stochastic MAB setting and our novel adversarial formulation.

View on arXiv PDF

Similar