Selective Sampling with Drift
This addresses the challenge of adapting selective sampling to non-stationary environments, which is incremental as it extends existing stationary methods to drifting scenarios.
The paper tackles the problem of selective sampling in online active learning when the target function drifts over time, such as in spam prediction, by developing a novel algorithm that works without assumptions on data generation and achieves mistake bounds dependent on drift amount, with simulations showing superiority on synthetic and real-world datasets.
Recently there has been much work on selective sampling, an online active learning setting, in which algorithms work in rounds. On each round an algorithm receives an input and makes a prediction. Then, it can decide whether to query a label, and if so to update its model, otherwise the input is discarded. Most of this work is focused on the stationary case, where it is assumed that there is a fixed target model, and the performance of the algorithm is compared to a fixed model. However, in many real-world applications, such as spam prediction, the best target function may drift over time, or have shifts from time to time. We develop a novel selective sampling algorithm for the drifting setting, analyze it under no assumptions on the mechanism generating the sequence of instances, and derive new mistake bounds that depend on the amount of drift in the problem. Simulations on synthetic and real-world datasets demonstrate the superiority of our algorithms as a selective sampling algorithm in the drifting setting.