CVJun 21, 2023

Multi-Task Consistency for Active Learning

arXiv:2306.12398v113 citationsh-index: 34
Originality Incremental advance
AI Analysis

This work addresses the need for efficient labeling in multi-task vision systems, offering a domain-specific improvement over existing active learning methods.

The paper tackles the problem of selecting informative samples for annotation in multi-task vision networks by proposing a multi-task active learning strategy that leverages inconsistency between object detection and semantic segmentation. The approach outperforms state-of-the-art methods by up to 3.4% mDSQ on nuImages and achieves 95% of fully-trained performance using only 67% of the data, reducing labels by 20% compared to random selection.

Learning-based solutions for vision tasks require a large amount of labeled training data to ensure their performance and reliability. In single-task vision-based settings, inconsistency-based active learning has proven to be effective in selecting informative samples for annotation. However, there is a lack of research exploiting the inconsistency between multiple tasks in multi-task networks. To address this gap, we propose a novel multi-task active learning strategy for two coupled vision tasks: object detection and semantic segmentation. Our approach leverages the inconsistency between them to identify informative samples across both tasks. We propose three constraints that specify how the tasks are coupled and introduce a method for determining the pixels belonging to the object detected by a bounding box, to later quantify the constraints as inconsistency scores. To evaluate the effectiveness of our approach, we establish multiple baselines for multi-task active learning and introduce a new metric, mean Detection Segmentation Quality (mDSQ), tailored for the multi-task active learning comparison that addresses the performance of both tasks. We conduct extensive experiments on the nuImages and A9 datasets, demonstrating that our approach outperforms existing state-of-the-art methods by up to 3.4% mDSQ on nuImages. Our approach achieves 95% of the fully-trained performance using only 67% of the available data, corresponding to 20% fewer labels compared to random selection and 5% fewer labels compared to state-of-the-art selection strategy. Our code will be made publicly available after the review process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes