Training-Free Class Purification for Open-Vocabulary Semantic Segmentation
This addresses computational efficiency for researchers and practitioners in computer vision by offering a training-free method to improve segmentation, though it is incremental as it builds on existing OVSS approaches.
The paper tackles the problem of class redundancy and visual-language ambiguity in open-vocabulary semantic segmentation by proposing FreeCP, a training-free class purification framework, which significantly boosts segmentation performance across eight benchmarks when used as a plug-and-play module.
Fine-tuning pre-trained vision-language models has emerged as a powerful approach for enhancing open-vocabulary semantic segmentation (OVSS). However, the substantial computational and resource demands associated with training on large datasets have prompted interest in training-free methods for OVSS. Existing training-free approaches primarily focus on modifying model architectures and generating prototypes to improve segmentation performance. However, they often neglect the challenges posed by class redundancy, where multiple categories are not present in the current test image, and visual-language ambiguity, where semantic similarities among categories create confusion in class activation. These issues can lead to suboptimal class activation maps and affinity-refined activation maps. Motivated by these observations, we propose FreeCP, a novel training-free class purification framework designed to address these challenges. FreeCP focuses on purifying semantic categories and rectifying errors caused by redundancy and ambiguity. The purified class representations are then leveraged to produce final segmentation predictions. We conduct extensive experiments across eight benchmarks to validate FreeCP's effectiveness. Results demonstrate that FreeCP, as a plug-and-play module, significantly boosts segmentation performance when combined with other OVSS methods.