Rethinking Crowdsourcing Annotation: Partial Annotation with Salient Labels for Multi-Label Image Classification
This addresses the challenge of expensive and error-prone manual annotation for multi-label images in computer vision, offering a more efficient crowdsourcing strategy.
The paper tackles the problem of low-quality crowdsourced annotations for multi-label image classification by proposing partial annotation with only salient labels, which reduces errors and annotation time. Their method, using an active learning approach and a novel Adaptive Temperature Associated Model (ATAM), achieves higher accuracy than state-of-the-art models trained on fully annotated images on datasets like COCO 2014.
Annotated images are required for both supervised model training and evaluation in image classification. Manually annotating images is arduous and expensive, especially for multi-labeled images. A recent trend for conducting such laboursome annotation tasks is through crowdsourcing, where images are annotated by volunteers or paid workers online (e.g., workers of Amazon Mechanical Turk) from scratch. However, the quality of crowdsourcing image annotations cannot be guaranteed, and incompleteness and incorrectness are two major concerns for crowdsourcing annotations. To address such concerns, we have a rethinking of crowdsourcing annotations: Our simple hypothesis is that if the annotators only partially annotate multi-label images with salient labels they are confident in, there will be fewer annotation errors and annotators will spend less time on uncertain labels. As a pleasant surprise, with the same annotation budget, we show a multi-label image classifier supervised by images with salient annotations can outperform models supervised by fully annotated images. Our method contributions are 2-fold: An active learning way is proposed to acquire salient labels for multi-label images; and a novel Adaptive Temperature Associated Model (ATAM) specifically using partial annotations is proposed for multi-label image classification. We conduct experiments on practical crowdsourcing data, the Open Street Map (OSM) dataset and benchmark dataset COCO 2014. When compared with state-of-the-art classification methods trained on fully annotated images, the proposed ATAM can achieve higher accuracy. The proposed idea is promising for crowdsourcing data annotation. Our code will be publicly available.