Near Instance Optimal Model Selection for Pure Exploration Linear Bandits
This work addresses the challenge of efficiently selecting the correct hypothesis class in linear bandits, which is important for applications like clinical trials or recommendation systems, though it appears incremental by building on existing bandit frameworks.
The paper tackles the model selection problem in pure exploration linear bandits by designing algorithms for fixed confidence and fixed budget settings that achieve near instance optimal guarantees, adapting to the instance-dependent complexity of the true model. It also extends these algorithms to handle model misspecification.
We introduce the model selection problem in pure exploration linear bandits, where the learner needs to adapt to the instance-dependent complexity measure of the smallest hypothesis class containing the true model. We design algorithms in both fixed confidence and fixed budget settings with near instance optimal guarantees. The core of our algorithms is a new optimization problem based on experimental design that leverages the geometry of the action set to identify a near-optimal hypothesis class. Our fixed budget algorithm is developed based on a novel selection-validation procedure, which provides a new way to study the understudied fixed budget setting (even without the added challenge of model selection). We adapt our algorithms, in both fixed confidence and fixed budget settings, to problems with model misspecification.