MLLGJun 20, 2019

Sequential Experimental Design for Transductive Linear Bandits

arXiv:1906.08399v1119 citations
Originality Highly original
AI Analysis

This work addresses sequential decision-making in constrained settings like drug discovery and recommender systems, offering a novel theoretical framework with practical implications.

The paper tackles the transductive linear bandit problem, where the goal is to infer the best item from a set with minimal noisy measurements, and provides an algorithm that matches instance-dependent lower bounds up to logarithmic factors, achieving near-optimal non-asymptotic performance.

In this paper we introduce the transductive linear bandit problem: given a set of measurement vectors $\mathcal{X}\subset \mathbb{R}^d$, a set of items $\mathcal{Z}\subset \mathbb{R}^d$, a fixed confidence $δ$, and an unknown vector $θ^{\ast}\in \mathbb{R}^d$, the goal is to infer $\text{argmax}_{z\in \mathcal{Z}} z^\topθ^\ast$ with probability $1-δ$ by making as few sequentially chosen noisy measurements of the form $x^\topθ^{\ast}$ as possible. When $\mathcal{X}=\mathcal{Z}$, this setting generalizes linear bandits, and when $\mathcal{X}$ is the standard basis vectors and $\mathcal{Z}\subset \{0,1\}^d$, combinatorial bandits. Such a transductive setting naturally arises when the set of measurement vectors is limited due to factors such as availability or cost. As an example, in drug discovery the compounds and dosages $\mathcal{X}$ a practitioner may be willing to evaluate in the lab in vitro due to cost or safety reasons may differ vastly from those compounds and dosages $\mathcal{Z}$ that can be safely administered to patients in vivo. Alternatively, in recommender systems for books, the set of books $\mathcal{X}$ a user is queried about may be restricted to well known best-sellers even though the goal might be to recommend more esoteric titles $\mathcal{Z}$. In this paper, we provide instance-dependent lower bounds for the transductive setting, an algorithm that matches these up to logarithmic factors, and an evaluation. In particular, we provide the first non-asymptotic algorithm for linear bandits that nearly achieves the information theoretic lower bound.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes