Apple Tasting Revisited: Bayesian Approaches to Partially Monitored Online Binary Classification
This work addresses a partially monitored online learning problem with applications in scenarios like medical testing or spam filtering, but it is incremental as it builds on prior apple tasting studies with new Bayesian approaches.
The paper tackles the apple tasting problem in online binary classification, where the learner only observes true labels when predicting 1, by applying Thompson Sampling with a logistic regression model, achieving an improved Bayesian regret bound and superior empirical performance compared to existing methods.
We consider a variant of online binary classification where a learner sequentially assigns labels ($0$ or $1$) to items with unknown true class. If, but only if, the learner chooses label $1$ they immediately observe the true label of the item. The learner faces a trade-off between short-term classification accuracy and long-term information gain. This problem has previously been studied under the name of the `apple tasting' problem. We revisit this problem as a partial monitoring problem with side information, and focus on the case where item features are linked to true classes via a logistic regression model. Our principal contribution is a study of the performance of Thompson Sampling (TS) for this problem. Using recently developed information-theoretic tools, we show that TS achieves a Bayesian regret bound of an improved order to previous approaches. Further, we experimentally verify that efficient approximations to TS and Information Directed Sampling via Pólya-Gamma augmentation have superior empirical performance to existing methods.