EGA: Adapting Frozen Encoders for Vector Search with Bounded Out-of-Distribution Degradation
This work addresses the critical problem of out-of-distribution degradation in vector search systems for practitioners deploying frozen encoders, offering a principled solution that maintains performance on seen classes while preserving unseen-class accuracy.
EGA introduces a residual adapter for frozen vision encoders that prevents performance collapse on unseen classes during vector search, achieving 96.5% gradient-free triplets at convergence and the highest worst-case Label Precision on four out of five OOD benchmarks, with over 40-point improvement over baselines.
Vector search systems built on frozen vision encoders face queries from unseen classes at deployment, yet existing adapter training collapses under this shift: high-capacity adapters with global contrastive losses silently reassign unseen-class samples to wrong seen-class clusters, dropping worst-case Label Precision by over 40 points below the frozen baseline in our tests. We propose Euclidean Geodesic Alignment (EGA), a residual adapter that couples three principles: zero initialization, local triplet loss, and hypersphere projection. These collectively induce a self-limiting dynamic: triplets that already satisfy a small margin stop producing gradients, so the adapter automatically stops updating where the local geometry is already correct. Our experiments show that at convergence $96.5\%$ of triplets are gradient-free, leaving unseen-class regions largely untouched while still enabling full-capacity refinement of seen classes. Across five diverse out-of-distribution (OOD) benchmarks, EGA achieves the highest worst-case Label Precision on the four primary splits and a consistent improvement on the fifth. The design also transfers to stronger backbones in addition to CLIP, and we provide an analytical justification linking gradient sparsity to bounded OOD perturbation.