LGMLJul 21, 2021

Design of Experiments for Stochastic Contextual Linear Bandits

arXiv:2107.09912v222 citations
Originality Incremental advance
AI Analysis

This work addresses engineering overhead in distributed or human-in-the-loop bandit systems, offering a more practical solution for data collection.

The paper tackles the practical challenge of deploying reactive exploration algorithms in stochastic linear contextual bandits by designing a single non-reactive policy to collect data, enabling extraction of a near-optimal policy with theoretical guarantees and experimental validation on synthetic and real-world datasets.

In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Exploring with a single non-reactive policy is beneficial in such cases. Assuming some batch contexts are available, we design a single stochastic policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes