CVCLJan 19

Open Vocabulary Panoptic Segmentation With Retrieval Augmentation

arXiv:2601.12779v1
Originality Incremental advance
AI Analysis

This addresses the challenge of segmenting arbitrary classes in images for computer vision applications, representing an incremental improvement over existing methods.

The paper tackles the problem of open vocabulary panoptic segmentation, where systems struggle to generalize to unseen classes, by proposing RetCLIP, a retrieval-augmented method that improves performance on unseen classes. The result is a 30.9 PQ, 19.3 mAP, and 44.0 mIoU on ADE20k, with absolute improvements of +4.5 PQ, +2.5 mAP, and +10.0 mIoU over the baseline.

Given an input image and set of class names, panoptic segmentation aims to label each pixel in an image with class labels and instance labels. In comparison, Open Vocabulary Panoptic Segmentation aims to facilitate the segmentation of arbitrary classes according to user input. The challenge is that a panoptic segmentation system trained on a particular dataset typically does not generalize well to unseen classes beyond the training data. In this work, we propose RetCLIP, a retrieval-augmented panoptic segmentation method that improves the performance of unseen classes. In particular, we construct a masked segment feature database using paired image-text data. At inference time, we use masked segment features from the input image as query keys to retrieve similar features and associated class labels from the database. Classification scores for the masked segment are assigned based on the similarity between query features and retrieved features. The retrieval-based classification scores are combined with CLIP-based scores to produce the final output. We incorporate our solution with a previous SOTA method (FC-CLIP). When trained on COCO, the proposed method demonstrates 30.9 PQ, 19.3 mAP, 44.0 mIoU on the ADE20k dataset, achieving +4.5 PQ, +2.5 mAP, +10.0 mIoU absolute improvement over the baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes