On Discarding, Caching, and Recalling Samples in Active Learning
This work addresses data management challenges in active learning for dynamic real-world applications, but it appears incremental as it builds on existing value-of-information concepts.
The paper tackles the problem of active learning in non-stationary environments where labeled data can become invalid over time, proposing principles for discarding, caching, and recalling data based on value of information, and evaluates these methods on simulated and real-world datasets to assess predictive performance and data acquisition costs.
We address challenges of active learning under scarce informational resources in non-stationary environments. In real-world settings, data labeled and integrated into a predictive model may become invalid over time. However, the data can become informative again with switches in context and such changes may indicate unmodeled cyclic or other temporal dynamics. We explore principles for discarding, caching, and recalling labeled data points in active learning based on computations of value of information. We review key concepts and study the value of the methods via investigations of predictive performance and costs of acquiring data for simulated and real-world data sets.