CVLGNov 7, 2024

BendVLM: Test-Time Debiasing of Vision-Language Embeddings

arXiv:2411.04420v121 citationsh-index: 11NIPS
Originality Incremental advance
AI Analysis

This addresses biases in VLMs for applications like retrieval and image generation, offering an incremental improvement over existing fine-tuning-free methods.

The paper tackles the problem of biases in vision-language model embeddings by proposing BendVLM, a nonlinear, fine-tuning-free method that tailors debiasing to each input, enabling more flexible and online-capable debiasing without requiring prior knowledge of inputs.

Vision-language model (VLM) embeddings have been shown to encode biases present in their training data, such as societal biases that prescribe negative characteristics to members of various racial and gender identities. VLMs are being quickly adopted for a variety of tasks ranging from few-shot classification to text-guided image generation, making debiasing VLM embeddings crucial. Debiasing approaches that fine-tune the VLM often suffer from catastrophic forgetting. On the other hand, fine-tuning-free methods typically utilize a "one-size-fits-all" approach that assumes that correlation with the spurious attribute can be explained using a single linear direction across all possible inputs. In this work, we propose Bend-VLM, a nonlinear, fine-tuning-free approach for VLM embedding debiasing that tailors the debiasing operation to each unique input. This allows for a more flexible debiasing approach. Additionally, we do not require knowledge of the set of inputs a priori to inference time, making our method more appropriate for online, open-set tasks such as retrieval and text guided image generation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes