LGFeb 26, 2022

Towards Scalable and Robust Structured Bandits: A Meta-Learning Framework

arXiv:2202.13227v113 citations
Originality Highly original
AI Analysis

This work addresses the curse of dimensionality in structured bandits for applications requiring scalable and robust decision-making, representing a novel method for a known bottleneck.

The paper tackles the challenge of online learning in large-scale structured bandits by proposing a meta-learning framework with a Bayesian hierarchical model and meta Thompson sampling algorithm, achieving scalability and robustness as supported by theoretical analysis and numerical results.

Online learning in large-scale structured bandits is known to be challenging due to the curse of dimensionality. In this paper, we propose a unified meta-learning framework for a general class of structured bandit problems where the parameter space can be factorized to item-level. The novel bandit algorithm is general to be applied to many popular problems,scalable to the huge parameter and action spaces, and robust to the specification of the generalization model. At the core of this framework is a Bayesian hierarchical model that allows information sharing among items via their features, upon which we design a meta Thompson sampling algorithm. Three representative examples are discussed thoroughly. Both theoretical analysis and numerical results support the usefulness of the proposed method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes