CVJun 26, 2025

ReME: A Data-Centric Framework for Training-Free Open-Vocabulary Segmentation

arXiv:2506.21233v26 citationsh-index: 6Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of segmenting images with arbitrary textual categories without fine-tuning, which is important for applications requiring flexible scene understanding, but it is incremental as it builds on existing data-centric ideas.

The paper tackles the problem of training-free open-vocabulary semantic segmentation by focusing on data quality, showing that a high-quality reference set significantly improves performance, with their method outperforming all existing training-free approaches across ten benchmark datasets.

Training-free open-vocabulary semantic segmentation (OVS) aims to segment images given a set of arbitrary textual categories without costly model fine-tuning. Existing solutions often explore attention mechanisms of pre-trained models, such as CLIP, or generate synthetic data and design complex retrieval processes to perform OVS. However, their performance is limited by the capability of reliant models or the suboptimal quality of reference sets. In this work, we investigate the largely overlooked data quality problem for this challenging dense scene understanding task, and identify that a high-quality reference set can significantly benefit training-free OVS. With this observation, we introduce a data-quality-oriented framework, comprising a data pipeline to construct a reference set with well-paired segment-text embeddings and a simple similarity-based retrieval to unveil the essential effect of data. Remarkably, extensive evaluations on ten benchmark datasets demonstrate that our method outperforms all existing training-free OVS approaches, highlighting the importance of data-centric design for advancing OVS without training. Our code is available at https://github.com/xiweix/ReME .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes