CVLGOct 10, 2025

Training Feature Attribution for Vision Models

arXiv:2510.09135v12 citations
Originality Incremental advance
AI Analysis

This addresses the need for improved trust and accountability in vision models by providing fine-grained, test-specific explanations, though it is incremental in combining existing attribution perspectives.

The paper tackles the problem of explaining deep neural networks by jointly attributing test-time predictions to specific regions in training images, revealing harmful examples and spurious correlations like patch-based shortcuts that existing methods miss.

Deep neural networks are often considered opaque systems, prompting the need for explainability methods to improve trust and accountability. Existing approaches typically attribute test-time predictions either to input features (e.g., pixels in an image) or to influential training examples. We argue that both perspectives should be studied jointly. This work explores *training feature attribution*, which links test predictions to specific regions of specific training images and thereby provides new insights into the inner workings of deep models. Our experiments on vision datasets show that training feature attribution yields fine-grained, test-specific explanations: it identifies harmful examples that drive misclassifications and reveals spurious correlations, such as patch-based shortcuts, that conventional attribution methods fail to expose.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes