CVApr 27, 2023

Towards Precise Weakly Supervised Object Detection via Interactive Contrastive Learning of Context Information

arXiv:2304.14114v23 citationsh-index: 40
Originality Highly original
AI Analysis

This work addresses the problem of precise object detection with only image-level tags for computer vision researchers, offering a novel method that integrates context information to reduce the gap with fully supervised approaches.

The paper tackles the performance gap in weakly supervised object detection (WSOD) by proposing JLWSOD, a framework that incorporates instance-wise and semantic-wise context information and uses interactive graph contrastive learning to optimize visual appearance and context, achieving improvements of 3.6%~23.3% in mAP and 3.4%~19.7% in CorLoc on PASCAL VOC and MS COCO benchmarks.

Weakly supervised object detection (WSOD) aims at learning precise object detectors with only image-level tags. In spite of intensive research on deep learning (DL) approaches over the past few years, there is still a significant performance gap between WSOD and fully supervised object detection. In fact, most existing WSOD methods only consider the visual appearance of each region proposal but ignore employing the useful context information in the image. To this end, this paper proposes an interactive end-to-end WSDO framework called JLWSOD with two innovations: i) two types of WSOD-specific context information (i.e., instance-wise correlation andsemantic-wise correlation) are proposed and introduced into WSOD framework; ii) an interactive graph contrastive learning (iGCL) mechanism is designed to jointly optimize the visual appearance and context information for better WSOD performance. Specifically, the iGCL mechanism takes full advantage of the complementary interpretations of the WSOD, namely instance-wise detection and semantic-wise prediction tasks, forming a more comprehensive solution. Extensive experiments on the widely used PASCAL VOC and MS COCO benchmarks verify the superiority of JLWSOD over alternative state-of-the-art approaches and baseline models (improvement of 3.6%~23.3% on mAP and 3.4%~19.7% on CorLoc, respectively).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes