CVAug 18, 2023

Self-Calibrated Cross Attention Network for Few-Shot Segmentation

arXiv:2308.09294v176 citationsh-index: 51Has Code
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in few-shot segmentation for computer vision applications, representing an incremental improvement over existing cross attention approaches.

The paper tackles the problem of few-shot segmentation by addressing how query background features get entangled with support foreground features in cross attention methods, proposing a self-calibrated cross attention block that aligns query patches with similar support patches and uses a scaled-cosine mechanism. The result shows a 5.6% improvement in mIoU over previous state-of-the-art methods on COCO-20^i under 5-shot setting.

The key to the success of few-shot segmentation (FSS) lies in how to effectively utilize support samples. Most solutions compress support foreground (FG) features into prototypes, but lose some spatial details. Instead, others use cross attention to fuse query features with uncompressed support FG. Query FG could be fused with support FG, however, query background (BG) cannot find matched BG features in support FG, yet inevitably integrates dissimilar features. Besides, as both query FG and BG are combined with support FG, they get entangled, thereby leading to ineffective segmentation. To cope with these issues, we design a self-calibrated cross attention (SCCA) block. For efficient patch-based attention, query and support features are firstly split into patches. Then, we design a patch alignment module to align each query patch with its most similar support patch for better cross attention. Specifically, SCCA takes a query patch as Q, and groups the patches from the same query image and the aligned patches from the support image as K&V. In this way, the query BG features are fused with matched BG features (from query patches), and thus the aforementioned issues will be mitigated. Moreover, when calculating SCCA, we design a scaled-cosine mechanism to better utilize the support features for similarity calculation. Extensive experiments conducted on PASCAL-5^i and COCO-20^i demonstrate the superiority of our model, e.g., the mIoU score under 5-shot setting on COCO-20^i is 5.6%+ better than previous state-of-the-arts. The code is available at https://github.com/Sam1224/SCCAN.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes