In-Context Learning for Pure Exploration
This work addresses the challenge of efficiently determining correct hypotheses in new tasks for machine learning and decision-making, representing an incremental advance by applying in-context learning to a known problem.
The paper tackles the problem of active sequential hypothesis testing (pure exploration) by introducing In-Context Pure Exploration (ICPE), which meta-trains Transformers to map observation histories to actions and predictions, enabling transfer learning without parameter updates at inference time. The result shows that ICPE is competitive with adaptive baselines across benchmarks like best-arm identification and generalized search, supporting Transformers as practical architectures for sequential testing.
We study the problem active sequential hypothesis testing, also known as pure exploration: given a new task, the learner adaptively collects data from the environment to efficiently determine an underlying correct hypothesis. A classical instance of this problem is the task of identifying the best arm in a multi-armed bandit problem (a.k.a. BAI, Best-Arm Identification), where actions index hypotheses. Another important case is generalized search, a problem of determining the correct label through a sequence of strategically selected queries that indirectly reveal information about the label. In this work, we introduce In-Context Pure Exploration (ICPE), which meta-trains Transformers to map observation histories to query actions and a predicted hypothesis, yielding a model that transfers in-context. At inference time, ICPE actively gathers evidence on new tasks and infers the true hypothesis without parameter updates. Across deterministic, stochastic, and structured benchmarks, including BAI and generalized search, ICPE is competitive with adaptive baselines while requiring no explicit modeling of information structure. Our results support Transformers as practical architectures for general sequential testing.