CVSep 14, 2023

Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion

arXiv:2309.07753v19 citationsh-index: 3

Originality Incremental advance

AI Analysis

This work addresses the problem of detecting common salient objects in image groups for computer vision applications, representing an incremental improvement with novel method components.

The paper tackles co-salient object detection by proposing a hierarchical Transformer module for semantic-level consensus extraction and a Transformer-based dispersion module to handle object variations across images, achieving state-of-the-art performance on three datasets.

Given a group of images, co-salient object detection (CoSOD) aims to highlight the common salient object in each image. There are two factors closely related to the success of this task, namely consensus extraction, and the dispersion of consensus to each image. Most previous works represent the group consensus using local features, while we instead utilize a hierarchical Transformer module for extracting semantic-level consensus. Therefore, it can obtain a more comprehensive representation of the common object category, and exclude interference from other objects that share local similarities with the target object. In addition, we propose a Transformer-based dispersion module that takes into account the variation of the co-salient object in different scenes. It distributes the consensus to the image feature maps in an image-specific way while making full use of interactions within the group. These two modules are integrated with a ViT encoder and an FPN-like decoder to form an end-to-end trainable network, without additional branch and auxiliary loss. The proposed method is evaluated on three commonly used CoSOD datasets and achieves state-of-the-art performance.

View on arXiv PDF

Similar