CVApr 27

DouC: Dual-Branch CLIP for Training-Free Open-Vocabulary Segmentation

arXiv:2604.2499750.3h-index: 3
AI Analysis

For researchers in open-vocabulary segmentation, DouC provides a simple, training-free method that enhances CLIP's dense prediction without retraining, though it is an incremental improvement over existing training-free approaches.

DouC introduces a training-free dual-branch CLIP framework for open-vocabulary segmentation, combining token gating and structural priors to improve reliability and spatial coherence. It outperforms prior training-free methods across eight benchmarks, scaling favorably with model capacity.

Open-vocabulary semantic segmentation requires assigning pixel-level semantic labels while supporting an open and unrestricted set of categories. Training-free CLIP-based approaches preserve strong zero-shot generalization but typically rely on a single inference mechanism, limiting their ability to jointly address unreliable local tokens and insufficient spatial coherence. We propose DouC, a training-free dual-branch CLIP framework that decomposes dense prediction into two complementary components. OG-CLIP improves patch-level reliability via lightweight, inference-time token gating, while FADE-CLIP injects external structural priors through proxy attention guided by frozen vision foundation models. The two branches are fused at the logit level, enabling local token reliability and structure-aware patch interactions to jointly influence final predictions, with optional instance-aware correction applied as post-processing. DouC introduces no additional learnable parameters, requires no retraining, and preserves CLIP's zero-shot generalization. Extensive experiments across eight benchmarks and multiple CLIP backbones demonstrate that DouC consistently outperforms prior training-free methods and scales favorably with model capacity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes