Renovating Names in Open-Vocabulary Segmentation Benchmarks
This addresses the issue of data quality for researchers and practitioners in open-vocabulary segmentation, though it is incremental as it improves existing datasets rather than introducing a new paradigm.
The paper tackles the problem of imprecise class names in open-vocabulary segmentation benchmarks by introducing a framework (RENOVATE) to renovate names, resulting in up to 15% relative improvement in model performance and enhanced training efficiency.
Names are essential to both human cognition and vision-language models. Open-vocabulary models utilize class names as text prompts to generalize to categories unseen during training. However, the precision of these names is often overlooked in existing datasets. In this paper, we address this underexplored problem by presenting a framework for "renovating" names in open-vocabulary segmentation benchmarks (RENOVATE). Our framework features a renaming model that enhances the quality of names for each visual segment. Through experiments, we demonstrate that our renovated names help train stronger open-vocabulary models with up to 15% relative improvement and significantly enhance training efficiency with improved data quality. We also show that our renovated names improve evaluation by better measuring misclassification and enabling fine-grained model analysis. We will provide our code and relabelings for several popular segmentation datasets (MS COCO, ADE20K, Cityscapes) to the research community.