CVJan 18, 2022

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

arXiv:2201.06696v177 citations
Originality Incremental advance
AI Analysis

This addresses the problem of generating object proposals without annotations for a wide variety of categories, which is incremental as it builds on CLIP to improve over existing methods.

The paper tackles unsupervised open-category object proposal generation by exploiting CLIP cues, achieving better performance than previous state-of-the-art methods on datasets like PASCAL VOC, COCO, and Visual Genome.

Object proposal generation is an important and fundamental task in computer vision. In this paper, we propose ProposalCLIP, a method towards unsupervised open-category object proposal generation. Unlike previous works which require a large number of bounding box annotations and/or can only generate proposals for limited object categories, our ProposalCLIP is able to predict proposals for a large variety of object categories without annotations, by exploiting CLIP (contrastive language-image pre-training) cues. Firstly, we analyze CLIP for unsupervised open-category proposal generation and design an objectness score based on our empirical analysis on proposal selection. Secondly, a graph-based merging module is proposed to solve the limitations of CLIP cues and merge fragmented proposals. Finally, we present a proposal regression module that extracts pseudo labels based on CLIP cues and trains a lightweight network to further refine proposals. Extensive experiments on PASCAL VOC, COCO and Visual Genome datasets show that our ProposalCLIP can better generate proposals than previous state-of-the-art methods. Our ProposalCLIP also shows benefits for downstream tasks, such as unsupervised object detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes