LGOct 13, 2020

Impact of Representation Learning in Linear Bandits

Jiaqi Yang, Wei Hu, Jason D. Lee, Simon S. Du

arXiv:2010.06531v222.459 citations

Originality Highly original

AI Analysis

This work addresses the challenge of sample efficiency in multi-task bandit problems for reinforcement learning and decision-making systems, representing an incremental advance with a novel method for a known bottleneck.

The paper tackles the problem of improving efficiency in linear bandits by leveraging shared representation learning across multiple tasks, achieving a minimax-optimal regret bound of ̃O(T√(kN) + √(dkNT)) that outperforms independent task approaches when T is large.

We study how representation learning can improve the efficiency of bandit problems. We study the setting where we play $T$ linear bandits with dimension $d$ concurrently, and these $T$ bandit tasks share a common $k (\ll d)$ dimensional linear representation. For the finite-action setting, we present a new algorithm which achieves $\widetilde{O}(T\sqrt{kN} + \sqrt{dkNT})$ regret, where $N$ is the number of rounds we play for each bandit. When $T$ is sufficiently large, our algorithm significantly outperforms the naive algorithm (playing $T$ bandits independently) that achieves $\widetilde{O}(T\sqrt{d N})$ regret. We also provide an $Ω(T\sqrt{kN} + \sqrt{dkNT})$ regret lower bound, showing that our algorithm is minimax-optimal up to poly-logarithmic factors. Furthermore, we extend our algorithm to the infinite-action setting and obtain a corresponding regret bound which demonstrates the benefit of representation learning in certain regimes. We also present experiments on synthetic and real-world data to illustrate our theoretical findings and demonstrate the effectiveness of our proposed algorithms.

View on arXiv PDF

Similar