CVAIJan 28

FAIRT2V: Training-Free Debiasing for Text-to-Video Diffusion Models

arXiv:2601.20791v11 citationsh-index: 24
Originality Highly original
AI Analysis

This addresses fairness issues in AI-generated video content for users and developers, representing a novel application of debiasing techniques to text-to-video models.

The paper tackles demographic bias, particularly gender bias, in text-to-video diffusion models by proposing FairT2V, a training-free debiasing framework that neutralizes prompt embeddings to reduce bias in generated videos, showing substantial reduction across occupations with minimal impact on quality.

Text-to-video (T2V) diffusion models have achieved rapid progress, yet their demographic biases, particularly gender bias, remain largely unexplored. We present FairT2V, a training-free debiasing framework for text-to-video generation that mitigates encoder-induced bias without finetuning. We first analyze demographic bias in T2V models and show that it primarily originates from pretrained text encoders, which encode implicit gender associations even for neutral prompts. We quantify this effect with a gender-leaning score that correlates with bias in generated videos. Based on this insight, FairT2V mitigates demographic bias by neutralizing prompt embeddings via anchor-based spherical geodesic transformations while preserving semantics. To maintain temporal coherence, we apply debiasing only during early identity-forming steps through a dynamic denoising schedule. We further propose a video-level fairness evaluation protocol combining VideoLLM-based reasoning with human verification. Experiments on the modern T2V model Open-Sora show that FairT2V substantially reduces demographic bias across occupations with minimal impact on video quality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes