MLLGMay 18, 2020

Meta-learning with Stochastic Linear Bandits

arXiv:2005.08531v164 citations
Originality Incremental advance
AI Analysis

This work addresses meta-learning for bandit tasks, offering incremental improvements in regret minimization for scenarios with many similar tasks.

The paper tackles the problem of meta-learning for stochastic linear bandits by proposing biased OFUL algorithms with two bias estimation strategies, showing theoretically and experimentally that these strategies outperform learning tasks in isolation when the number of tasks is large and task-distribution variance is small.

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes