MLLGApr 26, 2024

Low-rank Matrix Bandits with Heavy-tailed Rewards

arXiv:2404.17709v15 citationsh-index: 5UAI
Originality Incremental advance
AI Analysis

This work addresses a more realistic scenario in online learning and recommendation systems by handling heavy-tailed noise, which is incremental as it extends prior sub-Gaussian assumptions.

The paper tackles the problem of low-rank matrix bandits with heavy-tailed rewards, where rewards have finite moments instead of sub-Gaussian noise, and proposes the LOTUS algorithm achieving a regret bound of order ˜O(d^(3/2)r^(1/2)T^(1/(1+δ))/˜D_rr) that matches state-of-the-art bounds under sub-Gaussian conditions and is nearly optimal in T, with a lower bound of Ω(d^(δ/(1+δ))r^(δ/(1+δ))T^(1/(1+δ))).

In stochastic low-rank matrix bandit, the expected reward of an arm is equal to the inner product between its feature matrix and some unknown $d_1$ by $d_2$ low-rank parameter matrix $Θ^*$ with rank $r \ll d_1\wedge d_2$. While all prior studies assume the payoffs are mixed with sub-Gaussian noises, in this work we loosen this strict assumption and consider the new problem of \underline{low}-rank matrix bandit with \underline{h}eavy-\underline{t}ailed \underline{r}ewards (LowHTR), where the rewards only have finite $(1+δ)$ moment for some $δ\in (0,1]$. By utilizing the truncation on observed payoffs and the dynamic exploration, we propose a novel algorithm called LOTUS attaining the regret bound of order $\tilde O(d^\frac{3}{2}r^\frac{1}{2}T^\frac{1}{1+δ}/\tilde{D}_{rr})$ without knowing $T$, which matches the state-of-the-art regret bound under sub-Gaussian noises~\citep{lu2021low,kang2022efficient} with $δ= 1$. Moreover, we establish a lower bound of the order $Ω(d^\fracδ{1+δ} r^\fracδ{1+δ} T^\frac{1}{1+δ}) = Ω(T^\frac{1}{1+δ})$ for LowHTR, which indicates our LOTUS is nearly optimal in the order of $T$. In addition, we improve LOTUS so that it does not require knowledge of the rank $r$ with $\tilde O(dr^\frac{3}{2}T^\frac{1+δ}{1+2δ})$ regret bound, and it is efficient under the high-dimensional scenario. We also conduct simulations to demonstrate the practical superiority of our algorithm.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes