LGOct 28, 2023

Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed Rewards

arXiv:2310.18701v113 citationsh-index: 11
Originality Highly original
AI Analysis

This work addresses a gap in bandit algorithms for heavy-tailed reward distributions, which is incremental but important for applications such as finance and advertising.

The paper tackles the problem of generalized linear bandits with heavy-tailed rewards, which are common in real-world scenarios like financial markets, by proposing two novel algorithms based on truncation and mean of medians that achieve an almost optimal regret bound of O~(dT^(1/(1+ε))) and improve upon existing methods by a logarithmic factor when ε=1.

This paper investigates the problem of generalized linear bandits with heavy-tailed rewards, whose $(1+ε)$-th moment is bounded for some $ε\in (0,1]$. Although there exist methods for generalized linear bandits, most of them focus on bounded or sub-Gaussian rewards and are not well-suited for many real-world scenarios, such as financial markets and web-advertising. To address this issue, we propose two novel algorithms based on truncation and mean of medians. These algorithms achieve an almost optimal regret bound of $\widetilde{O}(dT^{\frac{1}{1+ε}})$, where $d$ is the dimension of contextual information and $T$ is the time horizon. Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches. Additionally, our mean-of-medians-based algorithm requires only $O(\log T)$ rewards and one estimator per epoch, making it more practical. Moreover, our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $ε=1$. Numerical experimental results confirm the merits of our algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes