LGJul 11, 2016

Stream-based Online Active Learning in a Contextual Multi-Armed Bandit Framework

arXiv:1607.03182v1

Originality Incremental advance

AI Analysis

This work addresses the challenge of costly data annotation in online learning systems, offering a solution for applications like recommendation or advertising, but it appears incremental as it builds on existing contextual bandit frameworks.

The paper tackles the problem of stream-based online active learning in contextual multi-armed bandits, where obtaining ground truth rewards is costly, by proposing an algorithm that refines context and arm spaces and strategically requests labels to maximize total reward. It analytically shows that the algorithm achieves sublinear regret comparable to conventional methods without query costs.

We study the stream-based online active learning in a contextual multi-armed bandit framework. In this framework, the reward depends on both the arm and the context. In a stream-based active learning setting, obtaining the ground truth of the reward is costly, and the conventional contextual multi-armed bandit algorithm fails to achieve a sublinear regret due to this cost. Hence, the algorithm needs to determine whether or not to request the ground truth of the reward at current time slot. In our framework, we consider a stream-based active learning setting in which a query request for the ground truth is sent to the annotator, together with some prior information of the ground truth. Depending on the accuracy of the prior information, the query cost varies. Our algorithm mainly carries out two operations: the refinement of the context and arm spaces and the selection of actions. In our algorithm, the partitions of the context space and the arm space are maintained for a certain time slots, and then become finer as more information about the rewards accumulates. We use a strategic way to select the arms and to request the ground truth of the reward, aiming to maximize the total reward. We analytically show that the regret is sublinear and in the same order with that of the conventional contextual multi-armed bandit algorithms, where no query cost

View on arXiv PDF

Similar