CRCVApr 16

Robustness of Vision Foundation Models to Common Perturbations

arXiv:2604.1497383.7h-index: 11
AI Analysis

For practitioners using vision foundation models in real-world applications, this work highlights a critical vulnerability to common image edits and provides a method to mitigate it.

This paper systematically studies the robustness of vision foundation models to common perturbations like JPEG compression and brightness adjustments, finding that six industry-scale models (OpenAI, Meta) are generally non-robust. The authors propose robustness metrics, show that perturbations degrade downstream classification accuracy, and introduce a fine-tuning method to improve robustness without sacrificing utility.

A vision foundation model outputs an embedding vector for an image, which can be affected by common editing operations (e.g., JPEG compression, brightness, contrast adjustments). These common perturbations alter embedding vectors and may impact the performance of downstream tasks using these embeddings. In this work, we present the first systematic study on foundation models' robustness to such perturbations. We propose three robustness metrics and formulate five desired mathematical properties for these metrics, analyzing which properties they satisfy or violate. Using these metrics, we evaluate six industry-scale foundation models (OpenAI, Meta) across nine common perturbation categories, finding them generally non-robust. We also show that common perturbations degrade downstream application performance (e.g., classification accuracy) and that robustness values can predict performance impacts. Finally, we propose a fine-tuning approach to improve robustness without sacrificing utility.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes