Aligning Data Selection with Performance: Performance-driven Reinforcement Learning for Active Learning in Object Detection
This work addresses a key bottleneck in active learning for object detection by aligning selection with performance metrics, offering a novel paradigm that could reduce labeling costs in applications like autonomous driving or surveillance.
The paper tackles the misalignment between data informativeness measures and task performance metrics like mAP in active learning for object detection, introducing MGRAL which uses reinforcement learning to directly optimize sampling for mAP improvement, achieving strong results on benchmarks such as PASCAL VOC and MS COCO.
Active learning strategies aim to train high-performance models with minimal labeled data by selecting the most informative instances for labeling. However, existing methods for assessing data informativeness often fail to align directly with task model performance metrics, such as mean average precision (mAP) in object detection. This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL), a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks, directly optimizing the sampling strategy using mAP. MGRAL employs a reinforcement learning agent based on LSTM architecture to efficiently navigate the combinatorial challenge of batch sample selection and the non-differentiable nature between performance and selected batches. The agent optimizes selection using policy gradient with mAP improvement as the reward signal. To address the computational intensity of mAP estimation with unlabeled samples, we implement fast look-up tables, ensuring real-world feasibility. We evaluate MGRAL on PASCAL VOC and MS COCO benchmarks across various backbone architectures. Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.