LGCVMay 18, 2023

STREAMLINE: Streaming Active Learning for Realistic Multi-Distributional Settings

arXiv:2305.10643v14 citations
Originality Incremental advance
AI Analysis

This addresses scenario-driven slice imbalance in streaming data for applications like autonomous vehicles and satellite imaging, representing an incremental improvement over existing methods.

The paper tackles the problem of learning unbiased models from high-volume, episodic multi-distributional data streams by proposing STREAMLINE, a streaming active learning framework that improves performance on infrequent yet critical data slices by up to 5% in accuracy for image classification and 8% in mAP for object detection.

Deep neural networks have consistently shown great performance in several real-world use cases like autonomous vehicles, satellite imaging, etc., effectively leveraging large corpora of labeled training data. However, learning unbiased models depends on building a dataset that is representative of a diverse range of realistic scenarios for a given task. This is challenging in many settings where data comes from high-volume streams, with each scenario occurring in random interleaved episodes at varying frequencies. We study realistic streaming settings where data instances arrive in and are sampled from an episodic multi-distributional data stream. Using submodular information measures, we propose STREAMLINE, a novel streaming active learning framework that mitigates scenario-driven slice imbalance in the working labeled data via a three-step procedure of slice identification, slice-aware budgeting, and data selection. We extensively evaluate STREAMLINE on real-world streaming scenarios for image classification and object detection tasks. We observe that STREAMLINE improves the performance on infrequent yet critical slices of the data over current baselines by up to $5\%$ in terms of accuracy on our image classification tasks and by up to $8\%$ in terms of mAP on our object detection tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes