CVLGNov 13, 2025

Fast Data Attribution for Text-to-Image Models

arXiv:2511.10721v14 citationsh-index: 111
Originality Incremental advance
AI Analysis

This work addresses the bottleneck of slow attribution methods for real-world applications like Stable Diffusion, representing an incremental but practical advancement.

The paper tackles the problem of computationally expensive data attribution for text-to-image models by proposing a scalable method that distills a slow attribution approach into an efficient retrieval system, achieving speed improvements of 2,500x to 400,000x while maintaining competitive performance.

Data attribution for text-to-image models aims to identify the training images that most significantly influenced a generated output. Existing attribution methods involve considerable computational resources for each query, making them impractical for real-world applications. We propose a novel approach for scalable and efficient data attribution. Our key idea is to distill a slow, unlearning-based attribution method to a feature embedding space for efficient retrieval of highly influential training images. During deployment, combined with efficient indexing and search methods, our method successfully finds highly influential images without running expensive attribution algorithms. We show extensive results on both medium-scale models trained on MSCOCO and large-scale Stable Diffusion models trained on LAION, demonstrating that our method can achieve better or competitive performance in a few seconds, faster than existing methods by 2,500x - 400,000x. Our work represents a meaningful step towards the large-scale application of data attribution methods on real-world models such as Stable Diffusion.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes