Pure Exploration with Infinite Answers
This work addresses a theoretical gap in pure exploration for machine learning and statistics, particularly in bandit problems with infinite answers, which is incremental as it builds on and generalizes existing finite-answer methods.
The paper tackles pure exploration problems with infinite answer sets, such as regression of continuous functions in bandits, by deriving an instance-dependent lower bound and showing why existing methods like Sticky Track-and-Stop fail to be asymptotically optimal in this setting. It presents a new framework, Sticky-Sequence Track-and-Stop, which generalizes prior methods and achieves asymptotic optimality, while also identifying special cases where existing methods remain optimal.
We study pure exploration problems where the set of correct answers is possibly infinite, e.g., the regression of any continuous function of the means of the bandit. We derive an instance-dependent lower bound for these problems. By analyzing it, we discuss why existing methods (i.e., Sticky Track-and-Stop) for finite answer problems fail at being asymptotically optimal in this more general setting. Finally, we present a framework, Sticky-Sequence Track-and-Stop, which generalizes both Track-and-Stop and Sticky Track-and-Stop, and that enjoys asymptotic optimality. Due to its generality, our analysis also highlights special cases where existing methods enjoy optimality.