CVMar 13

A Closed-Form Solution for Debiasing Vision-Language Models with Utility Guarantees Across Modalities and Tasks

arXiv:2603.1299875.1
Predicted impact top 45% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses fairness issues in VLMs for applications like zero-shot classification and retrieval, offering a training-free solution with theoretical guarantees, though it builds incrementally on prior debiasing work.

The paper tackles the problem of social biases in Vision-Language Models by proposing a debiasing method with a closed-form solution that achieves Pareto-optimal fairness with bounded utility losses, outperforming existing methods across diverse fairness metrics and datasets while preserving task performance.

While Vision-Language Models (VLMs) have achieved remarkable performance across diverse downstream tasks, recent studies have shown that they can inherit social biases from the training data and further propagate them into downstream applications. To address this issue, various debiasing approaches have been proposed, yet most of them aim to improve fairness without having a theoretical guarantee that the utility of the model is preserved. In this paper, we introduce a debiasing method that yields a \textbf{closed-form} solution in the cross-modal space, achieving Pareto-optimal fairness with \textbf{bounded utility losses}. Our method is \textbf{training-free}, requires \textbf{no annotated data}, and can jointly debias both visual and textual modalities across downstream tasks. Extensive experiments show that our method outperforms existing methods in debiasing VLMs across diverse fairness metrics and datasets for both group and \textbf{intersectional} fairness in downstream tasks such as zero-shot image classification, text-to-image retrieval, and text-to-image generation while preserving task performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes